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I Abstract 

We study error exponents for source coding with side information. Both achievable exponents and converse 
bounds are obtained for the following two cases: lossless source coding with coded information (SCCSI) and lossy 
^ ^ source coding with full side information (Wyner-Ziv). These results recover and extend several existing results on 

D source-coding error exponents and are tight in some circumstances. Our bounds have a natural interpretation as 

a two-player game between nature and the code designer, with nature seeking to minimize the exponent and the 
code designer seeking to maximize it. In the Wyner-Ziv problem our analysis exposes a tension in the choice of 
test channel with the optimal test channel balancing two competing error events. The Gaussian and binary-erasure 
|- ^ ' cases are examined in detail. 

I-H 

O I. Introduction 



In a typical lossy data compression problem a source is to be compressed by an encoder at a prescribed 
^ rate so that a decoder may reproduce the source to within some desired fidelity (distortion). Sometimes 
present, in addition to the data to be compressed, is some correlated information that can be utilized by a 

^ second encoder, that is able to send a separate message to the decoder. We refer to this kind of problem 
as source coding with side information (SCSI). The set-up is depicted in Fig. [TJ where a source X is 
compressed by encoder one to a rate R\ with the decoder having access to encoded side information Y , 

O compressed at rate i?2 by encoder two, as well as the compressed version of X from the first encoder. 

^ The SCSI scenario arises in a variety of applications. For example, in video applications [IJ X can 
represent a current frame, and Y a separate correlated frame sent from a second encoder. Y can even 

• ^ represent the frame(s) preceding the current frame X in the stream: while the previous frames are certainly 

K*^ available to the encoder, the encoder's coding scheme can be simplified by not making use of this 
c3 information and leaving the decoder to exploit the interframe dependence. A second example can be 
found in communication in networks with relays [2J. A source sends a message X to a sink in a network 
containing a relay. One mode of operation for the relay is "compress and forward", i.e. for the relay to 
send a compressed version of its observation, Y , of the source-sink message to the sink. This compressed 
message can be used by the sink to further aid its decoding. SCSI appears in applications even beyond 
communication, for example (with minor changes) it has been proposed as a model for rate-constrained 
pattern recognition ||3|. 

For the lossless problem with coded side information (SCCSl|^ and the lossy problem with full side 
information (Wyner-Ziv), the "rate region" problem, i.e. determining the rates required to meet a given 
average distortion constraint, is solved. In this paper, we study these two problems from an error-exponent 
standpoint. Our motivation for doing so is three-fold: 

• In the applications mentioned above the average distortion of a compression scheme is not the 
only important metric. Indeed, a video compression system with good average performance but that 



'Also known as the "One Helper" problem, Wyner's problem |j4j or the Ahlswede-Komer problem (5). 
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Fig. 1. The Source Coding with Side Information Problem 



frequently yields poor images, or a communication system that suffers from frequent outages is 
usually deemed unacceptable. In addition to minimizing the average distortion, one would like to 
minimize the fraction of time in which the images are poor or the relay is unable to help. 

• In some important cases, there is no rate loss, meaning that there is no difference in the rate-distortion 
performance between the SCSI problem and the problem in which the side information Y is available 
to the encoder as well as the decoder. In particular, it is well known that this is true of both the 
binary erasure and quadratic Gaussian forms of the Wyner-Ziv problem f6l. This raises the question 
of whether these two systems are equivalent when performance is measured via error exponents 
instead of the average distortion. 

• Recently a connection has been established between error exponents in channel coding and the 
stabilization of linear systems over noisy channels [|71, and there is a known interdependence between 
source- and channel-coding error exponents. Thus new techniques in source-coding error exponents 
could aid our understanding of problems at the intersection of communication and control flSl. 



A. Contributions and Overview 

Our key contributions are achievable exponents and converse bounds for the SCCSI and Wyner-Ziv 
problems. The conventional approach to proving coding theorems for the these problems [9| relies on 
typicality-based arguments and yields error exponents that are essentially zero. By using more sophisticated 
coding techniques, we obtain lower bounds that are strictly positive for all achievable rates and distortions. 
Both achievable exponents have a natural interpretation as a dynamic, two-player, zero-sum game between 
nature and the code designer, with nature seeking to minimize the exponent and the code designer seeking 
to maximize it. Play alternates between the two players, and the available actions for each stage correspond 
to marginal or conditional probability distributions. At the end of the game, the actions selected by the 
players together determine the joint distribution of all of the relevant random variables, which in turn 



determines the achievable exponent. See Sections |III-A and IV-A for more detail. 



For the SCCSI problem, our upper bound uses a change-of-measure argument that is more refined than 
the conventional approach [ITOl p.g. 268] and yields a formally better bound. This bound more accurately 
captures the structure of the problem and might be applicable to other network information setups. The 
proof also uses the Karush-Kuhn-Tucker (KKT) conditions in a novel way to obtain cardinality bounds 
on the auxiliary random variable. 

For the Wyner-Ziv problem, we supply results for both the discrete-memoryless and Gaussian versions 
of the problem. Our analysis indicates that the optimization of the coding scheme is a richer problem 
than it is when the goal is to minimize the average distortion. In both cases, the encoder performs vector 
quantization, with an associated test channel, followed by binning. When the goal is to minimize the 
average distortion, the test channel should be chosen to be "clean" so that the binning error probability 
vanishes but with a negligible exponent [9, Thm. 15.9.1]. When optimizing the error exponent, on the 
other hand, this choice is poor because the overall exponent is dominated by binning errors. Choosing a 
"noisy" test channel, leads to a large binning error exponent but results in little information transmission 
from the encoder to the decoder. This also leads to a poor overall exponent, because small atypicalities in 
(X", F") lead to a distortion error. The optimum choice of the test channel balances these two competing 
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error events. This is illustrated numerically in section V-A for the binary erasure version of the problem. 
A similar tension arises in the problem of compression for distributed hypothesis testing [fTTI . 

Our results present evidence that, for both the binary erasure and Gaussian cases, there is likely a 
difference in the error exponents between conventional Wyner-Ziv and the version of the problem in which 
the side information is available at both encoder and decoder (an "exponent loss"). This is in contrast 
to the rate-distortion version of the problem, for which the two scenarios have identical performance. 
Determining whether the reliability functions are indeed different is an interesting topic for future work. 

An application of our results on discrete-memoryless Wyner-Ziv allows us to determine the reliability 
function exactly (for a range of rates) for the lossless functional source coding problem, in which the goal 
is to reproduce a function g{X) at the decoder (see section IV-A). 

In our coding scheme the optimum test channel depends crucially upon the source statistics (see Fig. |4]), 
which for the applications mentioned at the outset may not be known exactly. Thus, another implication 
of our results is that video coding or relaying systems based on a Wyner-Ziv scheme are likely to require 
detailed knowledge of the source distribution. This provides a theoretical justification for the observation 
that good estimates of the correlation between source and side information are "critical to the performance" 
of practical Wyner-Ziv coding schemes [fTlll . 



B. Other Prior Work 

Error exponents for both SCCSI and Wyner-Ziv were studied by Arutyunyan and Marutyan fTSl. 
However, their results were not proven rigorously and appear to be unduly strong; they have recently been 
retracted [14J. Kochman and Womell [fTSl have recently studied achievable exponents for the Gaussian 
Wyner-Ziv problem using lattices, and have conjecture an exponent loss in certain settings. Eswaran and 
Gastpar fT6l have established an achievable exponent for the Berger-Yeung problem [17], which subsumes 
both of the problems studied here. Their approach is based on determining the rate of convergence in the 
Markov lemma and is fundamentally different from the approach used here. It is not difficult to find cases 
for which the achieveable exponent presented here exceeds their^ Morever, we shall see that approach 
used here reveals greater insight into both the design of coding schemes for these problems and theoretical 
questions such as the exponent loss for the Binary Erasure and Gaussian Wyner-Ziv problems. 

For the SCCSI problem, Csiszar and Komer [[TOl p.g. 268] provide an upper bound on the reliability 
function. This bound is formally improved in the present paper by using a more refined change-of -measure 
argument. For the Wyner-Ziv problem, Jayaraman and Berger |T8] studied the exponent associated with 
the binning error probability. One of the goals of this paper is to show that a binning error is only one of 
two competing error events. In this sense, at the error exponent level the Wyner-Ziv problem resembles 
the problem of distributed hypothesis testing [|T9ll . 

The Wyner-Ziv problem is in a sense "dual" to the problem of channel coding with side information 
(CCSI) (see l!20l , ||2T] for a precise statement). Comparing the results in this paper to error exponent 
studies of the CCSI problem [[221, [|23l, however, show that this duality breaks down at the level of 
error exponents. In particular, in the CCSI problem, the encoder can force the realization of the auxiliary 
random variable to have a specified joint distribution with the side information. In the Wyner-Ziv problem, 
however, the encoder must rely on the law of large numbers to ensure this. At the rate level, atypical 
realizations can be ignored and this difference is immaterial. At the level of error exponents, on the other 
hand, the two are quite different, and the Wyner-Ziv setup is more challenging. There is a substantial 
literature on error exponents for simpler source coding problems such as lossless compression with full 
side information [24l, [|25l, [|26l, the Slepian-Wolf problem [|27J, [|28l, [|2f|, and lossy compression without 
side information [30J . [|3TI . None of the these problems involve optimization over an auxiliary random 
variable, however, and we shall see that the presence of auxiliary random variables makes the error 
exponent problem significantly richer. 

^In fact, it is not difficult to find examples for which our exponents are infinite, while theirs is always finite. 
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C. Outline 



The rest of the paper proceeds as follows. Section |n] gives definitions. Section pll| formally states the 
SCCSI problem and contains our results and discussion. Similarly, section IV formally states the Wyner- 
Ziv problem and contains our results and discussion. Section |V] applies the Wyner-Ziv results to the binary 
erasure and Gaussian problems. The proofs of the theorems are somewhat involved and can be found in 
the appendices. 



IL Preliminaries 

A. Definitions and Notations 

We use V{X) to denote the set of discrete probability distributions on X and C{X — )• y) to denote all 
channels from X to y. For P E V{X) and V E C{X — )• 3^), we write P x V to denote the distribution 
of the pair (X, Y) E X y m which X is generated according to P(-) and Y is taken as the output of 
the channel V whose input is X. For P E V{X) and Py\x G y) we use Pxy as shorthand for 

Px X Py\x- 

We use X to denote vectors in Af"; usually the length of the vector is clear from the context. For any 
X E X" we write (5x(-) as the empirical distribution or type of x. The set of all sequences of length n with 
type Q is denoted Tq. The set of all type variables Q E V{X), i.e. those for which Tq ^ 0, is denoted 
V'iX). For Q E V'\X), we let C"(g,3^) denote the set of all W E C{X y) for which (1) T^^^^ is 
non-empty; and (2) in the case that Q{x) = 0, takes the form = |3^|~^ For x E A*" and 

V E C{X y) we denote by T'y (x), the set of sequences in 3^" having conditional type V given x. For 
a type Qy E V'^{y), we use the function k{QY) to refer to a unique index in {1, . . . , |^"(3^)|} for that 
type. 

When dealing with discrete random variables, all logarithms and exponents have base 2. We take 
OlogO = and logO = — oo based on continuity arguments. For a distribution or type P we let H{P) 
denote entropy. For strings x, y, we write //(x|y) as the conditional empirical entropy. For a distribution 
Px and a channel Py\x we write I{Px', Py\x) for the mutual information between X and Y supposing that 
Px X Py\x governs the pair. D{P\\Q) denotes the KuUback-Leibler (KL) divergence between distributions 
P and Q. We also use the standard definitions of conditional entropy, conditional mutual information, and 
conditional KL divergence. 

Whenever the range of a summation, maximization or minimization is clear we will use shorthand, e.g. 

T^Q^ev-ix) = ^Qx ■ We define = max(0,x). 

For the Gaussian Wyner-Ziv problem logarithms and exponents have base e. For K a variance or 
covariance matrix, we write fx as a shorthand for a A/'(0, K) Gaussian random density. For (X, Y) ~ fx, 
we write /kyix conditional distribution of Y given X and write Ky\x for the conditional covariance 

(matrix). h(K) denotes the differential entropy of a Gaussian random variable with distribution fx- A 
subscript K denotes that expectation or mutual information should be computed using fx, e.g /^-(X; Y) 
is the mutual information between jointly Gaussian random variables X and Y whose joint density is fx- 
D{K\\K) denotes the KL divergence between two Gaussian random variables/vectors with densities fx 
and Jk. 



III. SCCSI Results and Discussion 

Let (Xi,Yi) be the output of a memoryless source with distribution PxY{x,y) on a finite alphabet 
X X y. The encoders are deterministic functions /" : X"' — > A^i and : 3^" — )■ A^2- The first encoder 
observes only the i.i.d sequence X", the second encoder observes only the i.i.d sequence F". The decoder, 
(yf" : A^i X A^2 must reproduce X" using the messages from the encoders. 

For this problem the rate region was determined by Ahlswede and Korner [j51| and by Wyner ["4] who 
showed that Ri , R2 are achievable if 

35 - r - X s.t. Pi > H{X\S), R2 > I{Y- S). 
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The closure of the union of the pairs over all such S gives the entire rate region. 

Let the decoder output be denoted X" = (yf"(/f (X"), /^(F")). Then error probability is 

and we define the source coding with coded side information error exponent as 



v{PxY, Ri,R2) = limlimsup log 



Jl d2 'y 



(1) 



where the minimization ranges over all encoders and decoders /f , /^j^?", such that 

\ogMi<n{Ri + e). (2) 
Our main results for SCCSI are as follows. 
Theorem 1. Let Ri,R2, Pxy & V{X x y) be given. Then 
T]{PxY, Ri, R2) > Vl{Pxy, Ri, R2) = inf sup inf D{Qxys\\PxyQs\y) 

H{Qx)>Ri 

'[Ri + R2-H{Qxis\Qs) 

-I{Qy;Qs\y)V ifl{QY\Qs\Y)>R2 

[Ri-H{Qx\s\Qs)V ifl{QY;Qs\Y) < R2 

(3) 

where the joint distribution of X,Y,S is QyQs\yQx\ys <^nd S takes finitely man-lvalues. 

The scheme to achieve this exponent is explained in detail in Appendix |Aj In brief, operating on a 
type by type basis the scheme is as follows. The second encoder quantizes its observation using the test 
channel Qs\y and if necessary bins the quantizations into 2"^^ bins. The primary encoder, assigns an 
index from {1, . . . , 2"^^} to each string in the typeclass, using binning if necessary. In the case that the 
primary encoder was able to communicate without binning the decoder will make no error. Otherwise, 
the the decoder finds the pair of sequences with the smallest joint empirical entropy in the received bins 
and outputs the X string. 

Theorem 2. Let Ri, R2, Pxy S V{X x y) be given, and suppose that PxY{x,y) > for all x and y. 
Then 

riiPxY,Ri,R2)<MPxY,Ri,R2) = mf sup inf D{Qxy\\Pxy) (4) 

Qy Qgiy: QX\Y- 

HQy;Qs\y)<R2 ^(Qx\s\Qs)>Ri 

where the joint distribution of X,Y,S is QyQx\yQs\y> i-^- X.,Y and S form a Markov chain in that 
order, and S satisfies 

\s\<\x\-\y\ + \y\ + 2. (5) 



A. Discussion 

Both theorems can be viewed as a competitive game between two players, nature and the code designer. 
Nature's goal is to minimize the exponent and the code designer's goal is to maximize it. The structure 
of the problem determines the parameters and order of the plays. For example in Theorem [TJ nature 
plays first, choosing a "worst-case" side information distribution. Then knowing nature's choice, the code 
designer picks the best codebook (via its choice of test channel). Nature plays last, choosing the worst 

^Note that any choice of cardinahty for S yields a valid achievable exponent. 
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possible consistent joint distribution. Notice that the choices at each step match the "information" available 
to the players. 

A standard application of the change-of-measure argument IfTOl p.g. 268] provides the following upper- 
bound on the SCCSI exponent 

(TD n u \ ^ lr> n u \ ^ ■ ( j DiQxvWPxY) H{Qx\s\Qs) > Rl 

ri{PxY,Rl,R2) <VSp{PxY,Rl,R2j = inf sup < IT/^ 1^ A / D '^^^ 

QxY Qgjy: l^cx) H{Qx\s\Qs) < Rl, 

I{Qy\Qs\y)<R2 

where the sup is actually a maximum since the objective is either oo or D{Qxy\\Pxy)- It is straightforward 
to verify that rfu < r]sp, and so formally r]ir provides an improvement upon the standard sphere-packing 
upper bound. In the game theoretic interpretation the r]sp exponent is obtained by letting nature's play 
reveal the joint distribution of the source and side information, and then the code designer plays, choosing 
the best codebook. But in the SCCSI problem, the helper's test channel can only depend on the marginal 
type of the side information. Thus our improved upper bound better captures the inherent structure of the 
problem. 

We remark that in this and the next section, the solutions to the optimization problems in the theorems 
can be approximated arbitrarily well by searching over a fine grid. We have not studied conditions under 
which the optimization problems may be solved more efficiently (e.g. using convexity), nor for conditions 
under which a min-max theorem may simplify the problems. This may be interesting future work. 

The optimizations in Theorems [T] and [2] differ in several respects. Foremost, in Theorem [2] the inner- 
most optimization is over Qx\y, so that X, Y, S adhere to the Markov structure, yet in the achievable 
exponent this Markov constraint is not present. This differing Markov structure is also present in the 
partial Wyner-Ziv exponent results of Jayaraman and Berger [[T8l , [|32ll who attribute the gap between the 
sphere packing and random exponents (present even at low rates) in the binning exponent problem they 
studied to this type of difference in the Markov structure. The other differences between r]L and r]u are 
the range of the inner most optimization, the presence of the binning term in the achievable exponent 
and the fact that the choice of test channel is restricted in the upper bound. (This latter difference can be 
eliminated by adding the restriction I{Qy] Qs\y) < R2 to the choice of test channel in the lower bound, 
which only weakens the result.) 

Despite these differences, the bounds provided by the theorems do allow us to determine the error 
exponent exactly in some special cases. When R2 = 0, there is no possibility of encoding the side 
information. Taking S to be constant in both exponents, one recovers the standard point to point exponent 

inf D{Qx\\Px). 

Qx- 

H{Qx)>Ri 

More generally, if R2 is sufficiently large and Ri is sufficiently close to H{X\Y), then one can show that 
the achievable exponent ([3]) coincides with the upper bound in ([6]) and hence also Q. The proof of this 
fact parallels the proof that the random-coding and sphere -packing bounds for coding coding coincide 
above the critical rate. 



IV. Wyner-Ziv Results and Discussion 

Let {Xi.Yi) be the output of a memoryless source with distribution PxY{x,y) on a finite alphabet 
X xy. Let X be the reproduction alphabet and d : X x X —^M. a single letter distortion measure. Define 
the distortion between two strings as (i(x, x) = ^ ^"=1 d{xi,Xi). 

An encoder observes the i.i.d. source sequence, X" and communicates a message using nR bits (or nats) 
to the decoder. The decoder combines the message with the side information to give its reproduction 
X". The encoder/decoder pair are functions ip : X^ — )• Ai and (p : Ai x ^ X"-, where is a fixed 
set. 
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The rate region was determined by Wyner and Ziv [1331 . who showed that if the allowable distortion is 
A, then the required rate is given by 

Rwz{PxY. A) = inf I{X- Z) - I{Y; Z), 

where the infimum is over all auxiliary random variables Z such that (1) Z, X, and Y form a Markov 
chain in this order and (2) there exists a function A such that 

E[d{X,X{Y,Z))] < A. 

Let X" = ip{i/j{X"'), F") be the decoder's output and define the error probability 

P,{^,ip,A,d) = Pr (d{X\X^) > a) . (7) 
We define the Wyner- Ziv error exponent to be 



6{R, A, PxY, d) = limlimsup log 

^4-0 n— s>oo IT' 



min Pei'ip, f, A, d) 



(8) 



where the minimization ranges over all encoder/decoder pairs satisfying 

\og\M\<n{R + e). (9) 

Our main results for the Wyner-Ziv problem are as follows. 
a) Discrete Memoryless Case: 

Theorem 3. Let Pxy G V{X x 3^) and i? > 0, A > 0, d{-, •) be given. Then 

9iR, A, Pxy, d) > inf sup inf sup inf Gd [Qxyz, Pxy, f, d, A, R] (10) 

'^^ Qz\X feF^XYZ 



where 



Gd [QxYZ,PxY,f,d,A,R] 



D{Qxyz\\PxyQz\x) ¥.Q[d{XJ{Y,Z))] > A 
D{Qxyz\\PxyQz\ X) 
+ [R-I{Qx;Qz\x) 

+ HQy; Qz\y)] ^ EqKX, f{Y, Z))] < A 

l{Qx;Qz\x)>R 

oo otherwise. 



T = {/I/ : y X Z ^ X}, and Z takes finitely man^alues. Note in the final minimization over Qxyz, 
Qxz cind Qy are fixed to be those specified earlier in the optimization. 



For completeness, we state the upper bound, which can be proved easily following Marton's OOll 
sphere -packing/change-of-measure proof for the point-to-point case. 

Theorem 4. Let Pxy G x 3^) and i? > 0, A > 0, ■) be given. Then 

e{R,A,PxY,d)< inf D{Qxy\\Pxy)- 

QxY'-RwziQxY ,^]>R 

This result is analogous to the upper bound in (|6]) and is therefore not as strong as its SCCSI counterpart 
(cf. (|4])). We expect that this bound can be improved, although the technique used to obtain Theorem |2] 
does not seem to be applicable here. If this bound can be strictly improved in the binary erasure case, it 
would imply an exponent loss (see Section [V-A[ ). 

''As we are providing an achievable exponent, any choice of cardinahty for Z yields a valid achievable exponent 
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b) Gaussian Case: 

Theorem 5. Let (Xj,!^) be jointly Gaussian with zero means and covariance matrix 

^~ CxY 1 



and let d{x, x) 



{x — x^. Then for any R > 0, A > 0, and S as in ( fTT] ), 

A,/s,c^) >inf supinf sup inf Go [K,i:, X, A, R] 

0"X Pxz <^Y AgA Pyz'P^!/ 



(11) 



(12) 



where 



A = {A : 



{D{K\\K) E^[(X-A(r,Z))2]> A 
D{K\\K) 

+ [R-Ik{X-Z) Ek[(X-A(F,Z))2] < a (13) 
+ Ik{Y-Z)Y Ik{X;Z)>R 

^ oo otherwise, 

X M — > M : A(?/, z) = ay + f3z, a, (3 E [—Ma, Ma]}, the covariance matrix of (X, Y, Z) is 



K 



a 



X 



CxO'vPxy 
(^XPxz 



O'xO'YPxy CTxPxz 
CyPyz 



a 



Y 



CYPyz 



and 



K 



1 


CxY 


CxY 


1 


Pxz 


/■ Pxz 



pxz 

^XY 



pIz 



ex 



(14) 



Ma > is an arbitrary real number. The covariance matrix K corresponds to a source (X, y, Z\ where 
X, y ~ A/'(0, S), Z, X anJ F /orm a Markov chain in that order, and the distribution of Z conditional 
on X is taken from K. 

The theorem can be proven along the same lines as the discrete memoryless case, using a modified 
notion of Gaussian types [34] . The full proof can be found in Appendix |E} 

Theorem 6. Let (Xj, Yi) be jointly Gaussian with zero means and covariance S as in ( [TT| ). Let Rx\Y{fsj A) 
denote the conditional rate distortion function. Let 9 denote the error exponent for a modified Gaussian 
Wyner-Ziv problem in which the side information is also available at the encoder Then for any, A > 0, 

R>Rx\Y{h,^) 

e{R,AJ^,d)< inf /^(n||S) (15) 
where U is a 2 x 2 positive definite covariance matrix and 

Proof: See Appendix |Fj ■ 

Corollary 1. Under the assumptions of Theorem |^ we have that 

9{R,A,f^,d)<9{R,A,f^,d). 

Proof: Any code that works for the Wyner-Ziv problem will work when the encoder also sees the 
side information. This implies that the error exponent for the Wyner-Ziv problem is upper bounded by 
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the error exponent for the problem in which the side information is available at both the encoder and 
decoder. ■ 
The upper bound in the Corollary is identical to the change-of-measure upper bound obtained via 
Theorem |4j As with that bound, we believe that this upper bound can be improved, and showing a strict 
improvement would establish an exponent loss. 



A. Discussion 

As in the SCCSI case, in the Wyner-Ziv case the same game-theoretic interpretation holds, but there are 
more parameters and the game becomes more elaborate. Nature plays first, choosing the most "difficult" 
marginal distribution for X. The code designer plays next, selecting the "best" test channel for that 
difficult source. Nature plays again choosing the worst marginal distribution for the side information. 
Then, knowing everything chosen so far, the code designer chooses the estimation function. Nature has 
the final play, choosing the worst consistent joint distribution for triple the X, Y, Z. Once again the choices 
and order of plays match the problem. 

The nature of the optimizations in Theorems [3] and [5] give us some insight into the design of practical 
coding schemes by revealing a tension, which we examine in detail in the next section for the binary 
erasure and Gaussian problems. Briefly we see that the objective functions Go (resp. Gg) contain three 
cases which correspond to 

• a violation of the distortion constraint even when the codeword is decoded correctly; 

• the use of binning, leading to the potential for decoding the wrong codeword; 

• no possibility for error. 

A large codebook allows for a cleaner quantization and hence lower chance of the first kind of event. But 
this large codebook comes with the requirement of binning, leading to the potential for the second kind 
of event. Thus these two kinds of errors are in tension. 

Theorem [3] allows us to determine a portion of the reliability function for a certain functional source 
coding problem. If we wish to reproduce a function g{X) of the source X losslessly at the decoder, who 
already has Y, then the rate required is Hp(g(X)\Y), which follows from the results of Orlitsky and 
Roche fi35il . Setting the distortion measure to be 

diX,fiY,Z)) = dHigiX)JiY,Z)) 

{dn is the hamming measure) and evaluating Theorem |3] in the limit as A — i- provides an achievable 
exponent for this problem. This can be seen by always choosing Qz\x so that Z = g{X) and letting the 
reproduction function be f(Y, Z) = Z. Using the fact that Z ^ X o F, one can show that the limit as 



A ^ of the righthand-side of equation (10) is 



^LiR,PxY)= inf DiQxY\\PxY) + [R~HQigiX)\Y)] + . (16) 

QxY-HQ{giX))>R 

An upper bound on the error exponent for this problem is given by 

^u{R,Pxy)= inf D{Qxy\\Pxy). (17) 

QxY-HQ{g{X)\Y)>R 



On account of the fact that both ( [161 ) ^i^d ( [T7| ) are optimizations of a continuous function over a compact 



sets, the inf is attained. The relationship between these two functions is analogous to the relationship 
between the sphere -packing and random coding exponents in channel coding [10. Lemma 2.5.4]. Thus 
for R > until some critical rate Rc the reliability function for the functional source coding problem is 
given exactly by 

min D{Qxy\\Pxy)- 

Qxy--Hq{9{X)\Y)>R 
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V. Examples 

A. Binary Erasure Case 

As an application of Theorem [3j we turn to the binary erasure version of the Wyner-Ziv problem. In 
this case, X is uniformly distributed over the set { — 1,+1}, and Y equals X passed through a binary 
erasure channel with erasure probability p 

P{Y = 0|X = 1) = p = 1 - P{Y = 1\X = 1) 
P{Y = 0\X = -1) = p= 1- P{Y = -l\X = -1). 

We would like to permit the reconstruction string to have erasures but not errors. The reconstruction 
alphabet is thus 

i = {-l,0,l}. 

One way to avoid errors in the reconstruction string is to use the "erasure" distortion measure 

{0 if X = X 
1 if X = 
oo otherwise. 

This distortion measure is overly harsh, however, in that it prohibits all errors. For the Wyner-Ziv problem, 
higher rates can be achieved if one tolerates a vanishing probability of error. We will therefore consider 
a finite approximation of this distortion measure, 

{0 if X = X 
1 if X = 
K otherwise, 

where K is a large but fixed constant. We will examine the rate-distortion and reliability functions in the 
limit as K tends to infinity. 

To determine the rate-distortion function in this case, let Z be the output of a binary erasure channel 
with input X and erasure probability 5. If Z, X, and Y form a Markov chain in this order, then it follows 
that 

I{X-Z)-I{Y-Z)=p{l-5). 

if z = 1 or y = 1 

if z = and y = (18) 
otherwise. 



is achievable. To see that this is in fact the best possible, consider the problem in which the side information 
F" is available to both the encoder and the decoder. The rate-distortion function for this problem is given 
by 

min I{X]X\Y). 

p{x\x,y) 

such that 

E[d(X,X)] < A. 

This minimization can be computed using classical techniques and shown in the limit as K tends to 
infinity to equal [p — A] + . It follows that [p — A]+ is the rate-distortion function for both problems. In 



There is a natural choice of / for this case 




Then E[rf(X, f{Y, Z))] = p5, and so any rate 

R > 
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particular, there is no "rate loss" in the sense that the rate-distortion function is the same whether the side 
information is available at both the encoder and decoder or at the decoder only. 

We note that for the problem with side information at both the encoder and decoder, there is a simple 
scheme that achieves the rate-distortion function \p — A] + . Since the encoder knows the locations of the 
erasures in F", it can simply communicate the value of X" in the first nR erased locations. 

We now turn to the application of Theorem |3] to this set-up. For simplicity of exposition, we will consider 



the optimization problem in (10) with two restrictions: (1) Qx is fixed to be the uniform distribution over 



{ — 1,+1}; and (2) we optimize Qz\x over the class of binary erasure channels, instead of optimizing 



over the class of all test channels from X io Z. The optimization problem in (10) then reduces to 



sup min G[Qxyz,PxyJ,^,R]- 

Qz\X '^Y\XZ 



This optimization problem can be written in the following alternative form 

sup min(Gi(Qz|x),G'2(Qz|x)), (19) 

Qz\x 

where 

Gi{Qz\x) = min D{Qxyz\\PxyQz\x) 

Qy\xz 

with the minimization being over all Qy\xz such that 

EQ[d{XJiY,Z))]>A, 

and 

G2{Qz\x) = min D{Qxyz\\PxyQz\x) + [R - Iq{X; Z) + Iq{Y- Z)] + , 

Qy\xz 

with the optimization being over all Qy\xz such that 

EQ[d(X,/(r,Z))]<A, 

and 

Iq{X-Z)>R. 

This last condition, of course, either holds for all choices of Qy\xz or for none of them. 



The alternative form of the optimization problem given in (19) is useful because it shows that maxi- 
mizing over the binary erasure test channel amounts to maximizing the minimum of the exponents of two 
error events: the first, Gi{Qz\x), is the exponent on the event that and together provide insufficient 
information about X" to enable the decoder to meet the distortion constraint. Thus an error will occur 
even if the codeword is decoded correctly. The second, G2{Qz\x), is the exponent on the probability 
of a binning error. 

These two error exponents are in tension in the following sense. Choosing Qz\x to have a low probability 
of erasure communicates many of the bits in X" to the decoder via Z^. This makes it unlikely that 
and will reveal too few bits about X" for the decoder to meet the distortion constraint, meaning that 
Gi{Qz\x) will be large. At the same time, choosing Qz\x to have a low probability of erasure requires 
the use of large codebook, which makes the binning error probability high, leading to a small G2{Qz\x)- 
On the other hand, choosing Qz\x to have a high probability of erasure leads to exactly the opposite 
behavior: the binning error probability is small since little information is being communicated through 
Z", but it is much more likely that the realization of and do not collectively reveal enough of the 
bits in X" to meet the distortion constraint. 

This tension is illustrated in Fig. |2j The optimum choice of Qz\x is given by a moderate erasure 
probability that balances the exponents of the two error probabilities. With this choice, both are dominant 
error events. 

The exponent itself is shown for various R in Fig. [3] Since we have not optimized over Qx, this is 
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0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 

Average distortion {p5) 



Fig. 2. Tension in choice of tiie test channel erasure probability 5, revealed by Theorem [3] Note that p5 is the average distortion of the 
system. Here A = 0.15, p = 0.5, and R = 0.425. 




Rate R (bits per sample) 

Fig. 3. Upper bound on error exponent of Theorem |3] and the error exponent of the scheme that makes use of side information at the 
encoder. The parameters A, p are the same as those used in Fig. |2] 

properly interpreted as an upper bound on the error exponent of the scheme. Fig. [3] also shows the error 
exponent of the simple scheme mentioned above for achieving the rate-distortion function when the side 
information is available at both the encoder and the decode][j The error probability of this scheme is 
simply the probability that contains more than n(R + A) erasures. Assuming R> p — A, the exponent 
of this event is equal to 

D{R + A\\p), 

i.e., the relative entropy between two Bernoulli distributions, one with success probability -R + A and one 
with success probability p. Fig. [3] shows that when the side information is available at both the encoder and 
decoder the exponent is higher than for our one-sided scheme. This suggests that there may be exponent 
loss, although considering non-erasure test channels may close this "gap". 



^This is also the upper bound in Theorem 4 
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0.08 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 O.f 
Test channel correlation p 



Fig. 4. Test channel optimization for Theorem|5] The plot shows the exponent against pxz, holding ox = 1 fixed for R = 0.4, C,xy = 0.7 
and A = 0.4. 



1.4r 



1.2 - 



" Our Exponent 
- — "Informed" Encoder 
NoSide information 



0.6 - 



< 0.4- 



0.2- 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 
Rate R (nats per sample) 



0.8 0.9 



Fig. 5. A plot of the achievable exponent of Theorem|5] Here C^^y = 0.7 (the correlation coefficient between the source and side information) 
and A = 0.4. i?(A) = 0.121 nats for these parameters. 



B. Gaussian Case 

A similar test channel tension arises in the Gaussian case. This can be seen most clearly by considering 
the optimization problem over p^z for fixed a\. In Fig. [4] we plot 

G'3(P2:2) = inf sup inf G [i^, S, A, A, i?] 

where we hold a\ = 1, and = 7^(1, cxy, 1, p^y, Pyz, Pxz) is the covariance matrix of (X, F, Z). 

Intuitively, p^z controls the number of different codewords we use to cover the source sequences. At 
rate R the scheme allows us to identify at most exp(ni?) codewords uniquely, and binning is required 
to go beyond this. A large codebook has the advantage that each source can be mapped to a better (i.e. 
closer) codeword. As we increase the size of the codebook beyond this point, the gains from having a 
"cleaner" codebook are outweighed by the penalty we pay for binning. From the plot we can see there 
is an optimum choice that occurs around p^z = 0.76 for the parameters of the plot. 
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Figure |5] shows the exponent plotted (by numerically solving the optimization problem) against the rate. 
For comparison the upper bound of Theorem |6] is included, as is the exponent for the no side information 
case, corresponding to the continuous version of Marton's point-to-point exponent [30] . This result was 
proved by Ihara and Kubo OTIl . who showed the exponent is 

DiUJlfi) = l (Aexp(2i?) - log(Aexp(2i?)) - 1) . (20) 



inf 

2 

ilog(^)>R 



We can show our achievable exponent recovers ( [20| ) by taking the side information to be statistically 
independent i.e. = 0. In this case, one can show that p^y = Pyz = solve the inner optimization 
problem of pO] ). Further, since X JLY,Y cannot help achieve the distortion constraint, choosing ay = 1 
is nature's best play. With these choices we see that D(K\\K) = L'(/^2^||/i) and we are left with the 

following equivalent optimization (where we have written X = aZ) 

E[{X -Xf] > A or 
I{X;X) > R 
oo otherwise. 



inf sup 

"'X n^f ,cr^ 



As nature will always pick ax such that the supremum is finite, we are left with 

inf D{f^.Jf,). 

Expanding the divergence and appealing to the monotonicity of x — log x gives ( |20l ^ 

Using equation ( [201 ) ^i^d Theorem [6] we can determine the error exponent exactly when the side 
information is available at both the encoder and decoder. In this case, Wyner [36, section 3] provides 
a simple scheme to achieve the rate distortion function. The encoder simply subtracts the conditional 
mean E[X|y = y] from the source. An achievable exponent then follows by computing the point-to-point 
exponent for the random variable X\Y = y, which is again Gaussian, with mean —(y and variance 1 — C^- 
Our achievable exponent in this case is 

.,, , 1 /Aexp(M) /Aexp(M)\ 



1 



(21) 



We now show that this is in fact the best we can do, by showing that pT[ ) coincides with the upper 
bound of Theorem [6j The optimization problem of Theorem [6] can be solved as follows. We first note 
that if X, Y are zero mean with covariance matrix K, then Var(X|F) = '^^y'- Hence we may write the 
problem as 

inf D(K\\J:) 

KyO: g{K,A,R)<0 

where K y means the matrix K is positive semi-definite and g{K, A, i?) = — logdet(/i') + log(A) + 
log 62 -ft'e2 + 2R. The KKT conditions tell us the optimum K* must satisfy 

[0 



1) -liK* 

2) Xg{K*) = 0. 
One can solve to this system to find 



+ 



elK*e2 







K* 



e + Aexp{2R) C 
C 1 



Evaluating D{K*\\T.) yields pTj ). Therefore, when the side information is available in both places we 
have determined the exponent exactly as (|2T|). 



^Using a virtually identical argument one can show that exponent of Theorem [s] reduces to Marton's exponent for the discrete-memoryless 
case when the side information is independent of the source. 
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Appendix A 
Proof of Theorem d] 

If Ri > log 1^*11, then clearly r]{PxY, Ri, R2) = 00 and the result trivially holds, so we suppose that 

Ri < log \Xi\. 

A. Scheme 

We start by describing a scheme and then show the scheme has the performance specified in the theorem. 
Let e > be given. For a given blocklength n, we operate on a type-by-type basis. The encoding and 
decoding functions are defined as follows. 

Encoder 1: For each type-class Tq^ with log|T"(Qx)| > nR the encoder and decoder agree on a 
random binning scheme: for every sequence in Tq^, a bin index is assigned uniformly at random from 
{1,2,..., exp(r;,_Ri)}. For the case that log |T"(Qx)| < nR each sequence is assigned a unique index. To 
encode a sequence x, the encoder sends the type and its index, Ui{-). Mathematically /" : Aii 
is 

/r(x) = (f/i(x),MQx)), 

where 

Mi=M[x M'l, 

M[ = {l,2,...,Mi ^exp(ni?i)}, 

>i;' = {l,2,...,(n + l)l^l}. 

Encoder 2: For each type Qy, fix a conditional type Q*s\y{Qy) ^ C"(Qy,5) and randomly choose a set 
of codewords B'"(Qy) in the following way. The size of B"'{Qy) is an integer satisfying 

exp(n/(gy; Q*s\y{Qy)) + (|:^^| \S\ + 2) log(n + 1)) 

< |5"(Qy)| (22) 

< exp(r2/(Qy; W^)) + d^l I'^l + 4) log(^ + 1)) 

and the codewords are drawn uniformly, with replacement, from the marginal type class Tg, induced by 

Qy and Q^|y(Qy). Define S : T^^ ^ 5"(Qy) as follows. Let ^^(y) = B"{Qy) nTJ^.^^^^Qjiy). If g{y) 

is non-empty, then the output of S{y) is drawn uniformly at random from ^(y)|^ If ^(y) is empty the 
output of ^(y) is drawn uniformly at random from B"'{Qy). The function S{-) determines a quantization 
of y, the observation of the second encoder. We define 5" = S {¥"•). 

If \B'^{Qy)\ > exp(ni?2) then the helper encoder assigns an index from the set {1, . . . , exp(?2i?2)} to 
each unique codeword in B'^(Qy) uniformly at random; in the opposite case each element of B'^(Qy) is 
assigned a unique index. In either case we let U2{s) denote the index assigned to s G B'^(Qy). To encode 
a sequence y G Tq^, the encoder sends the type of y and the index, U2{S{y)), of the quantization S'(y). 
Mathematically the second encoder, : — t- Ai2, is specified as follows 

f^{y) = {U2{S{y)),k{Qy)) 

where 

M2 
M'2 

Decoder: 

^Codewords that appear multiple times are proportionally more likely to be selected. 



= M'2 X M'^, 

= {l,2,...,M2 = exp{nR2)}, 
= {l,2,...,(n + l)l^l}. 
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The decoder receives two (bin) indices, i from encoder one and and j from encoder two. It then attempts 

to jointly decode the pair (x, s) using a minimum empirical entropy rule. That is, the decoder tries to 
find the pair (x, s) among all sequences in corresponding bins satisfying H{x,s) > H{%s). If there is 
no such pair it chooses (x, s) uniformly at random from the bins consistent with received bin indexes. 
Mathematically, this is: 



g"'{k{Qx),^,k{QY),J) 



(x, s) if C/i(x) = i, U2{s) — j and 

V(x,s)^(x,s),C/i(x)=i,C/2(s)=j 
H{St,s) > H{±,s) 
any (x, s) with t/i(x) — i, U2{s) — j if no such (x, s). 

The decoder's final output is just the first element of the pair g''^{k{Qx),i, k{QY),j)- 

B. Error Probability Calculation 

To begin we define the following sets 

Er,i = {(x,y,s) : H{Q^) > Ri, \B"{Qy)\ < exp(ni?2)}, 
T>r,i = {QxYS ■■ H{Qx) > Ri, |S"(Qy)| < exp(ni?2)}}, 

£r,2 = {(x,y,s) : H{Q^) > |S"(gy)| > exp(ni?2)}, 
Vr,2 = {QxYS ■■ H{Qx) > Ri, |S"(gy)| > exp(ni?2)}, 
£:e = {(x,y,s):s^r^.^^(Q^)(y))}, 

T^c = {QxYS '■ Qs\Y Q*S\y(Qy)}, 

and the following event F = {3 s e B'^Qy^) -seT^* (^^^^(F")}. 
The following lemmas will be required. 

Lemma 1. Let X", y", S"' be generated according to our scheme and suppose that (x, y, s) is in {ScY, 
i.e., that s e r^*|^(Qy)(y). Then 

Pr(X" = X, = y, S"^ = s) (23) 
<P^^(x,y) ^ (24) 

Proof: For the x, y, s in this lemma, {X^ = x, F" = y, 5"" = s} impUes that the event F has 
occurred. Thus 

Pr(X" = X, F" = y, 5" = s) 

= Pr(X" = X, = y, S"" = s, F) 
= P^^(x,y)Pr(F|X" = x,r- = y) 
X Pr(^" = s|X" = X, F'^ = y, F) 
< P^y{^, y) Pr{S^ = s\X- = X, = y, F) 

where in the final Une we used that conditional on F and {F" — y}, S"^ is uniformly distributed over 
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Lemma 2. Let X", F", S"" Z?^ generated according to our scheme and suppose that (x, y, s) G £c- Then 

Pr(X" = X, = y, 5" = s) 

< exp(-(n + 1)2). (25) 

Proof: For the x, y, s in this lemma, {X" = x, = y, 5" = s} implies that event has occurred. 

Thus 

Pr(X'^ = X, = y, 5" = s) 

= Pr(X" = X, y" = y, 5" = s, F") 

= P^(y) Pr(F"|F" = y) Pr(X" = x|F" = y, F") 

X Pr(^" = s|X" = X, = y, F") 
< Pr(F"|F" = y). 

Pr(F'=|F" = y) is the probability that there is no s e i?"(Qy) so that s G T^, (Qy)(y)- ^^^^ ^^^^ 
give an upper bound on this probability using the properties of the codeword set. Let m = \B"'{Qy)\ and 
B''{Qy)[i\ be the ith codeword in the set 5"(Qy). Then 

Pr(F'=|F" = y) 



nPr(5"(Qy)[^]^T4^,(Q,)(y)) 

i=l 
m 

n[l-Pr(5"(gy)[^]eT^.^^(Q^)(y))] 

i=l 

m 



1 - 



I^QJ|y{Qy)(y)l 



S I 



< exp I 1 m 

where the last line followed by applying the inequality (1 — t)™ < exp(— tm). Next, using the following 
bounds on the cardinality of type classes [fTOl lemmas 2.3 and 2.6], 

\T^^\<eMnH{Qs)) 
|T^,i,(y)| > {n + ir\y\\^\eMnH{Qs\Y\Qy)) 
and that /(Q^|^(Qy); Qy) = H{Q*s) - H{Q*g^y{Qy)\Qy) we have 

I^Q*,„(Qy)(y)l 



<-(n + l)-l^ll^l exp(-n/(gy; QW(Qy)))- 



— — ■ v"-"r-L; c^PV V^yi '^sn 



Thus, 

Pr(F'=|r" = y) 



< exp {-{n + l)-l^ll^l exp(-n/(Qy; Q*5|y(Qy)))^) 

< exp(-(n + 1)2) 



where the final line followed by substitution our choice of m from ([22]). 
Lemma 3. For any pair of strings x, y, let 

5(x,y) = {5,y|i/(i,y)<//(x,y)}. 
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Then 

|5(x,y)| < (n + 1)1^11^1 exp(ni/(x,y)). 

Proof: 

< E exp(ni/(Qxy)) 

< (n + 1)1^11^1 exp(ni/(x,y)). 

■ 

Lemma 4. For any pair of strings x, y, let 

5(x|y) = {5|if(i|y) < i/(x|y)}. 

|^(x|y)| < (n + 1)1^11^1 exp(n/f(x|y)). 
Proof: The proof mirrors that of Lemma |3] and is omitted. ■ 
Lemma 5. Let (x, y,s) G H {ScY- Then 

Pr(X" 7^ X"|X" = X, F" = y, 5" = s) 

< exp (-n [R^ - H{Q^\,\Q,) - (26) 

5„ = -log(n + l)l'5ll'^l. 

Proof: In the setting of this lemma the decoder knows S" since the quantization can be decoded 
unambiguously from the index U2{S'^). Thus, the decoding rule amounts to finding an x string with lower 
conditional empirical entropy in the received bin. The set S'(xls) (cf. Lemma |4]) contains all the sequences 
with lower conditional empirical entropy (conditioned on s), but having the same type as x. Therefore 
we can bound the decoding error probability as 

Pr(X" ^ = X, F'^ = y, S"" = s) 

< J2 Pr(t/i(i) = t/i(x)) 
xes(x|s) 

<|5(x|s)|exp(-n/?i) 

< exp(-n(i?i - if(Qx|s|<5s) - 5n)) 

where the final line used the result from Lemma |4} Further bounding the probability by one gives the 
result. ■ 

Lemma 6. Let (x, y, s) G Sr^2 H {Sc)". Then 

Pr(X" ^ X"|X" = X, = y, 5" = s) 
< 2 exp I —n 



R1+R2- i/(gx|s|gs) - I{Q*s\YiQyy, Qy) - 



(27) 
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where 

- \og{n + l)l'S|l'^l+|y|l'5|+|5|+4_ 



n 



Proof: Let us introduce the random variable S"^ to denote the decoder's guess of the codeword S"". 
Observe that the probability of the event of interest may be decomposed as follows 

Pr(X" 7^ = X, y" = y, 5" = s) = 

Pr(X" ^ X", = ^"|X" = X, = y, = s) 

+ Pr(X" 7^ X", 5" 7^ = X, = y, 5" = s). (28) 

We will show that both of the probabilities on the right are exponentially small and that the second 
summand dominates (in an error exponent sense) the first. To treat the first summand on the righthand 
side of pSj ) we begin by upper bounding the probability of {5" = 5"} conditional on {X" = x, F*^ = 
y, 5*" = s} by 1; we are then interested in bounding 

Pr(X" ^ X"|X" = X, F" = y, 5" = s, 5" = 5"). 

The analysis in the proof of Lemma 5 shows that this probability is bounded by ([26]). 



For the second summand of ([28]) the event can occur only if there is a pair (x, s) with s G B'^{Qy) 
such that the pair have lower joint empirical entropy than the true pair (x, s) and are the same bins t/i(x) 
and U2{s). Using the set ^(x, s) from Lemma [3] we can bound this probability as follows 



Pr(X" ^ X", 5" ^ 5"|X" = X, F" = y, 5" = s) 

< Yl P^UiiSt) = f/i(x), U2{s) = U2{s),se B^{Qy)\X^ = X, F" = y, S^' 



(x,s)e5(x,s) 
x^^x and s^s 

= J2 Pr(f/i(i) = t/i(x))Pr(sG5"(gy)|X" = x,F" = y,^" = s) 

{x,s)£5(x,s) 
x^^x and s^s 

X Pr([/2(s) = [/2(s)|s G B"(Qy),X'^ = x,F" = y, = s). 
We now show that 

Pr(s G 5"(Qy)|X" = x,F" = y,^" = s) < Pr(s G B'^iQy)). (29) 
To establish this we will show that 

Pr(5" = s|X" = X, F" = y, s G B'^iQy)) < Pr(5" = s|X" = x, F" = y), (30) 
which implies the result by reversing the conditioning. Suppose first that s ^ ^q* |y(Qy)(y)5 then 

Pr(5" = s|X" = X, F" = y, s G B'\Qy)) 

= Pr(5" = s, F|X" = X, F" = y, s G 5"(gy)) 

= Pr(F|X" = X, F" = y, s G 5"(gy)) Pr(5" = s|X" = x, F" = y, s G B"(Qy), F) 
< Pr(F|X" = X, F" = y) Pr(5" = s|X" = x, F" = y, s G fi"(Qy), F), 

where the inequality follows because dropping the conditioning event that {s G B"'{Qy)} frees up a 
position in the codebook, which increases the probability of F. Continuing we obtain 

Pr(5" = s|X" = X, F" = y, s G B'^iQy)) 

< Pr(F|X" = X, F'^ = y) Pr(5" = s|X" = x, F" = y, F) 

= Pr(^",F|X" = x,F" = y) 

= Pr(5"|X" = x, F" = y), 
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where the inequality used the conditional independence of {S"" = s} and {s G B'^{Qy)} given F. The 
case that s G ^Qj|y{Qy)(y) can be handled by a straightforward coupling argument. Equation ([30]) is 

establishecQ 



Applying a standard bound for the cardinality of a typeclass and ( [22| ) we see that 

|i?"(Qy)l 



Pr(s G i?"(gy)) < 



< 



exp{nI{Qy- Q*s^y{Qy)) + (|3^| \S\ + 4) \og{n + l)){n + 1) 



151 



exp{nH{Q*,)) 

where Q*g denotes the type induced by Qy and Qg^yiQy)- Additionally by the code construction, for 
X 7^ X and s 7^ s we have 

Pr(f/i(i) = f/i(x)) Pr(f/2(s) = U2{s)\s G fi"(gy),X" = x,y" = y) = exp(-n(i?i + i?2)). 

These calculations, together with Lemma [3] imply that 

Pr(X" ^ X'\ ^ S^'IX'' = X, = y, = s) 

< exp{-n{R, + R2- ff(gx,s) - HQy-, Q*s\YiQy)) + H{Q*s) 
-n-WS\\X\ + \S\+4:+\S\\y\)login + l))). 

We now note that (x, y, s) G £^ implies that s is a valid codeword and therefore Qs = Q*s. By expanding 
H{Q:x,^s) using the chain rule and canceling the H(Qs) terms in the previous display we obtain 

Pr(X" ^ X'\ ^ S'^IX" = X, y" = y, 5" = s) 

< exp(-n(i?i + i?2 - i^(QxlslQs) - HQy; QsiriQy)) 
- n-\\S\\X\ + |5| + 4 + \S\\y\) log(n + 1))). 

We then observe that (x, y,s) G £r,2 implies 

(|:y||5|+4) log(n+l) 



(31) 



n 



>R2-I{Qy,Q*slYiQy)) 



and therefore that 



Ri + R2 — -f^(<5x|s|Qs) — liQy'i Q*s\YiQy)) ~ 

<Ri- /f(QxislQs) - n-~\\S\\A:\ + \S\) log(n + 1) 
<Ri- i/(Qx|s|Qs) - n-\\S\\X\) log(n + 1). 



This calculation shows that the righthand side of pT] ) is larger than the righthand side of ([26]). To complete 
the proof we use the fact that a+b < 2 max(a, b) and keep the summand of ( [28] ) with the smaller exponent. 



Lemma 7. Let 5^, (5„ be three sequences converging to zero. Let 

'[Ri + R2-H{Qx\s\Qs) ifH{Qx)>Ri 

- I{Qy] Qs\y) - ^nY and I{Qs\y] Qy) > R2 - Sn 
[Ri - HiQxislQs) - SnV ifH{Qx) > Ri 

and I{Qs\y; Qy) < R2 - 
00 otherwise 



FliQxYS-, Rl,R2) 



*A similar reasoning can be used to verify the final inequality in the proof of Lemma 15 in 
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Fi{QxYs, Ri, R2) 



{[Ri + R2-H{Qx\s\Qs) ifH{Qx)>Ri 

- I{Qy\ Qs\y)V KQs\Y\ Qy) > R2 

[Ri - H{Qxis\Qs)V ifH{Qx) > Ri 

and I{Qs\Y', Qy) < R2 
00 otherwise 



and 



F^{PxY,Ri,R2)=mm max min D{Qxys\\PxyQs\y) + F^{Qxys, Ri, R2) (32) 

Qy Qs\y<^C'^(Qy,S)Qxys 



F~ (Pxy, i?i, i?2) = inf sup M D{Qxys\\PxyQs\y) + Fi{Qxys, Ru R2). (33) 

Qs\Y(^c(y^s)QxYs 



where in ( [32| ) the optimizations are over types/conditional types and in ( [33| ) the optimizations are over all 
distributions, and in both cases the inner optimizations are compatible with the outer ones, i.e. assume 
QxYS = Qy X Qs\Y X Qx\SY- Then 

liminf F"(Pxy,i?i,i?2) > F°^ {Pxy , Ru R2) ■ 

n— ^-oo 

Proof: Let Qxys ^ '^"('^ x 3^ x '5) solve the optimization problem in ( |32l ), i.e. 

F'^iPxY, Ri, R2) = DiQ^xYsWPxYQ^syY) + Pi iQxYs^ R^i R'i)- 
Along a subsequence that attains the lim inf in the statement of the Lemma there is a further subsequence 
Q^XYS ^^^^ converges, and so by relabeling this subsequence we can arrange it so that Q^xys ~^ Qxys 
Let S > 0. Then there exists a O^jy so that 



inf DiQxYs\\PxYQf\Y)+FiiQxYS,Rl,R2) 
Qxys-Qy=Qy 

Qs\Y=Q'^\Y 

> sup inf D{Qxys\\PxyQs\y) + Fi{Qxys,RuR2)-S. 

QxYS-QY=Qf 

Qs\Y=Qs\Y 

Furthermore, we may find a sequence Q^jy converging to Q^y We now choose 

qPys = arg min DiQxYs\\PxYQ^sl) + ^iXQ XYS^ Rli R2)- 



UYs&V"(xxyxsy. 

Qy=Q'y^ 

Qs\Y=Q^SlY 



Again by compactness and relabeling we may arrange it so that Qxys ~^ Qxys- Now we observe that 

minmax min D{Qxys\\PxyQs\y) + F^{Qxys, Ri, R2) 

Qy Qs\y Qxys 

= max min D{Qxys\\PxyQs\y) + F^^iQxYS, Ru R2) 

Qs\Y Qxys- 
Qy=q'y^ 
Qs\Y=Qs\Y 

> min D{Qxys\\PxyQ^^I) + F^{QxYS,RuR2) 

Qxys- 
Qy=Q^y^ 

Qs\y=Q^s\y 

= D(QS5ll^xyQS,|i^) + Fr(QS5,^i,^2). (34) 
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We now prove that 



'S\Y^ 

> DiQ^ysWPxvQfw) + F,{Q^ys,Ri,R2). (35) 

The case that Qxys ^^Lch. that H{Qx) < Ri, follows by continuity of entropy (since H{Q^^) < Ri 
for all n sufficiently large). In the opposite case, i.e. HiQ"^) > Ri, if H{Q^x^) < Ri for all n sufficiently 
large the left side is infinity and so the inequality must hold. For the remaining case that H{Qx^) > Ri 
for infinitely many n we split into sub-cases. Sub-case one: /goo (S; Y) < R2, then /^(„) (5; Y) < R2 — 5n 
for all n sufficiently large so the result is true by lower semi-continuity of the information measures. 
Sub-case two: Iq^^{S; Y) > R2, but then 

liminf[i?i - H{Q%\Qf) - 5^ = [Ri - H{Qx\s\QsT 

> [R, - H{Q^^s\Qf) + R2- HQy; QfwT- 
Therefore ([35]) is established. Taking the liminf in ( [341 ) and applying ( [35] ) yields 

liminfF"(Pxy,i?i,i?2) 

> D{Qxys\\PxyQs\y) + -^ilQxys'-Ri'-Ra) 



> inf D{QxYs\\PxYQf\Y) + Fi{QxYS,RuR2) 

Qxys-Qy=Q^ 

Q3\Y=Q'§\y 

> sup inf D{Qxys\\PxyQs\y) + Fi{QxYS,Ri,R2) - S 

Qs\Y Qxys-Qy=Qy 
Qs\Y=Qs\Y 

> inf sup inf _ D{Qxys\\PxyQs\y) + Fi{Qxys,RuR2)-S. 

Qy Qgjy Qxy.5-Qy=Qy 

Qs\Y=Qs\Y 

Letting 6—^0 gives the result. ■ 
Proof of Theorem^ To prove the theorem we will upper bound P^ = Pr(X" 7^ X"), the probability 
of error for our scheme. For any e > 0, we note that for n sufficiently large the constraints in ([2]) are met. 
Define Er = £r,i U £r,2- Observe that on (ErY the scheme makes no error, thus 



£rr\£c 





S" = s) Pr(X" = X, 


>^" = y, 


5" = s) 


x,F" = 


= y, 5" = s) Pr(X" = 


= X, = 


= y,5'" = s) 


X, = 


= y, ^" = s) Pr(X" = 


= X, = 


= y,5" = s) 


= X, y" 


= y, ^" = s) Pr(X" 


= X, y" 


= y,^" = s) 



+ Pr(^"7^^"|^" = x,y" = y,^" = s)Pr(X" = x,r" = y,^" = s) 

+ 5]Pr(X" = x,r" = y,^" = s) 
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where the final inequality follows by bounding the conditional error probability by 1 on Ec- Applying 
LemmasJT] and |5] to the summation over i fl {E^'', Lemmas [T] and [6] to the summation over £^,. 2 and 
Lemma |2]to summation over E^. we obtain 

+ 2exp(-n[i?i + i?2-i^(Qx|s|Qs) 



- IiQ*S\YiQy);Qy) - 
+ ^exp(-(n + l)2). 



^5y(x,y) 



IT. 



Qoiy (Qy) 



(y)l 



Observing that the summation over Ec decays super-exponentially, we may safely omit this term, and use 

the notation < to denote inequality to the first order of the exponent. Now summing first over types and 
then over sequences within the type class, we get 



exp{-n[Ri - H{Q^\s\Q 



SI - 5n] 



+ ^ 5Z exp(-n[i?i + i?2 -i^((5x|s|Qs 

Qxysec,-,2n(©e)= {x,y,s)eT5^^^ 



^^y(x,y) 



I{Q*S\Y{Qy)iQy) -^n] 



(36) 



(37) 



where in the summation over joint types Qxys, the marginal type of Y is fixed to be that set by the 
earlier summation. Using the following facts 



P^^(x, y) = eM-n{D{Q^y\\PxY) + ^(Qxy))) 

\T^^^^\<eMn{H{QxYs)))<eMn\og{\X\\y\\S\)) 

|T" I > {n + l)-\y\\^\eMn{H{Qs\Y\QY))) 



(38) 
(39) 



and continuing from ( [361 ), we can further bound Pg as follows 



Qy 



J2 exp ( -n([R, - H{Qx\s\Qs) - 5n] + 

Qxysei'r,in(i'c)= 

+ D{Qxy\\Pxy) + H{Qxy) + H{Qs\y\Qy) - H{Qxys))) 

exp ( - n([R, + R2- H{Qxis\Qs) - HQy; Qs\y) - I 



QxYS<^'Dr,2r\(VcY 

+ D{Qxy\\Pxy) + H{Qxy) + H{Qs\y\Qy) - H{Qxys] 



(40) 



Next we note that 



D{Qxy\\Pxy) + H{Qxy) + H{Qs\y\Qy) - H{Qxys) 
--D{Qxy\\Pxy) + H{Qs\y\Qy) - H{Qs\xy\Qxy) 
--D{Qxys\PxyQs\y), 
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and substituting this identity into ( |40l ) gives 



exp ( - n[[R, - H{Qx\s\Qs) - + D{Qxys\\PxyQs\y))) 

+ ( - ^([^1 + ^2 - i^(gx|5|Q5) - I{Qy\ Qs\y) - + D{QxYs\\PxyQs\y] 

We may now upper bound the summations by maximizing over the types and optimizing over the choice 
of test channel Qs\y- We now let be defined as in Lemma |7] and apply ( [22] ) to yield 

Pe < IV'iX xyxS)\ ir^iy) I max min max 

Qy Qs\Y<^C"{QY,sy.QxYsn{Vcr 

exp ( -n(^D{QxYs\\PxYQs\Y) + {Q xy s , Ri, R2))) ■ (41) 



Let F" be as defined in ( [32] ). We may move the optimizations appearing in ( |4T] ) into the exponent and 
this yields 

Pe<exp(-n(F"(Pxy,i?i,i?2))). 

Then we have 

liminf--logPe > liminf -- log(exp(-n(F"(Pxy, i?2)))) 
= liminfF"(Pxy,i?i,i?2) 

> F°°{PxY, Ri, R2) 

where the final line followed by an application of Lemma |7j ■ 

Appendix B 
Proof of Theorem [2] 

Before proving Theorem |2} we prove two technical lemmas. We first prove the cardinality bound on 
5* given in ([5]). This argument differs from conventional cardinality -bound proofs in that it uses the KKT 
conditions in addition to Caratheodory's theorem. We then prove a continuity lemma that is similar to 
Lemma |7] For the purposes of these lemmas define two new quantities 

^7c/(-Pxy,-Ri,-R2) = inf sup inf D{Qxy\\Pxy) 

'^^ Qs\Y-\s\<\x\-\y\+\y\+2 Qx\y- 

I{Qy;Qs\y)<R2 HiQx\s\Qs)>Ri 

and %(Pxy, Pi, P2) = inf sup inf D{Qxy\\Pxy)- 

Qy Q<,|y: Qx\Y- 
I{Qy\Qs\y)<R2 H{Qx\s\Qs)>Ri 

Note that fiu differs from 'qu only in that the inequality in the inner-most infimum is no longer strict, 
and fjjj differs from fiu only in the omission of the cardinality bound on S. Since for Pi > log|A:'i|, 
Vu{PxY, R11R2) = 00 and Theorem [2| is trivial, we assume throughout this appendix that Pi < log \ 



Lemma S. If Ri < log \ Xi\ and Pxy{x, y) > for all x and y, then fju = rjjj. 

Proof: Clearly fjjj > fjjj. To show the reverse inequality, it suffices to show that for all Qy and all 
Qs\Y such that I{Qy', Qs\y) < R2, there exists Qs\y such that 

1) nQY,Qs\Y)<R2 

2) \s\ < \x\-\y\ + \y\ + 2 
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3) l{QY,Qs\Y)<liQY,Qs\Y), 

where 

i{Qy,Qs\y)= inf D{Qxy\\Pxy). 

Qx\Y- 

H{Qx\s\Qs)>Ri 

We will show that Q*x\y achieving the infimum in 7 for a given Qs\y must satisfy certain KKT conditions. 
Caratheodory's theorem will then be used to show that Qs\y can be replaced by a cardinality-limited 
distribution for which the same Q*x\y ag^m satisfies the KKT conditions, and therefore Q*x\y attains the 
infimum in this case also. 

Fix Qy and Qs\y- For the Pxy of the hypothesis, 7(Qy, ■) has a continuous objective and a compact 
feasible set, so there exists Q*x\y such that 

i{Qy,Qs\y) = D{QyQ*xiy\\Pxy) 

and H{Q*^^g\Qs) > Ri- Since the objective in this optimization problem is convex and the constraint 
is convex and strictly feasible, the optimizer Q*x\y must satisfy the KKT conditions for optimality ll37l 
p.g. 243]: there exist^ 

fJ'x,y > for all X, y 
A > 

Uy >0 for all y 

such that 

Q*ix\y)Qiy) , ^ , 
Qiy) [^og ^^^^^^ +1 + X) -fi,,y + Uy 

+A(5^Q(s)(Q(i/|s)log(5^Q*(a;||/')Q(?/»))) =0 for all x,?/ 

s y' 

l^x,yQ{x\y) = for all a;, y 
\{H{Q*j,^s\Qs)-Ri)=0 
^yEQ*(a;|l/)-l)=0 for ally. 

X 

By Caratheodory's theorem ifTOl Ch. 3, Lemma 3.4], there exists Q{s) such that 

\s:Qis)>o\<\x\-\y\ + \y\ + 2 



and 



$^0(5)Q(Z/|5)=Q(1/) for ally 



Qiy)[^og p^^^^^ +i + Aj -^l^,y + yy 

+A(5^Q(s)(Q(i/|s)log(5^Q*(a;|2/')Q(?/'k)))) =0 forallx,!/ 

s y' 

i{Qs;Qy\s) = HQs;Qy\s) 

H{Q*x\s\Qs)=H{Q*^^s\Qs). 

'The assumption that PxY{x,y) > for all x and y guarantees that D{QyQx\y\\Pxy) is finite. If this quantity is infinite, then the 
KKT conditions may not hold at Qx\y- 



26 



Define Qs\y via Qy\sQs/Qy- Then Qs\y is feasible because 

i{Qs;Qy\s) = i{Qs;Qy\s)<R2 

Given that the code designer selects the test channel Qs\y instead of Qs\y, Q*x\y ^^^^^ ^ feasible choice 
for nature because 

H{Q*x\s\Qs) > Ri 
Moreover, Q*x\y ^^^^ satisfies the KKT conditions and 

^^^H P{x,y) + 1 + - ^-.2/ + ""y 

+A(5^Q(s)(g(y|s)log(5^Q*(a;|y')Q(y'k)))) =0 forallx,y 

s y' 

fJ'x,yQ{x\y) = 
XiHiQ*j,^s\Qs)-Ri)=0 
^yEQ*(a;|l/)-l)=0 for ally. 

X 

Since 7(Qy, ■) is convex, the KKT conditions are also sufficient for optimality, and we have 

liQY,Qs\Y) = D{Qxy\\Pxy) = liQY,Qs\Y). 

m 

Lemma 9. For Ri < log \ we have 

limfjuiPxY, Ri + e,R2 + e) = VuiPxY, Ri, ^2)- 

e-S-O 

Proof: Clearly fju{PxY-,Ri + e, -R2 + e) > VuiPxY, Ri, R2) for all e > 0. To show the reverse 
inequality, fix a sequence e„ J, 0. Note that there exists Qy such thaj^ 

sup inf D{QyQx\y\\Pxy) < i'n.f sup inf D{Qxy\\Pxy) + 

QS\Y- Qs|y: Qx\Y- 

For each n, there exists such that 

inf D{Qx\yQ*y\\Pxy)> sup inf D{Qx\yQ*y\\Pxy) - 5. 

Qx\Y- Qs\Y- Qx\Y- 

H{Qx]s\QP)>Rl+^n I{Q'Y'Qs\Y)<R2+en H (Q x\s\Q s)>Ri+^n 

By considering subsequences, we may assume that 

Qg^Y ~^ QslY- 

Then there exists Qx\y ^^'^^ that 

HiQxislQf) > Riy 

and 

D{Q^\yQy\\Pxy) < inf D{QxiyQy\\Pxy) + S. 

Qx\Y- 

H{Qx^s\QW)>Ri 

'"Throughout this proof, Qs\y is assumed to satisfy the cardinahty bound ijsj. 
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Note that for all sufficiently large n, we have 
Then for all sufficiently large n. 



Vu{PxY,Ri + en,R2 + en) < sup inf D {Q x\yQ*y\\Pxy) 



Qs\ 



<IX\Y- 



I{QY'Qs\Y)<R2+en H{Qx\.s\Qs)>Rl+^n 

< inf D{QxiyQ*y\\Pxy) + S 

Qx\Y- 

H(Qx\s\Q's"^)>Ri+^n 

< D{Q'^^yQ*y\\PxY) + S. 

Thus 

limsnpfiu{PxY,Ri + en,R2 + en) <D{Q^^yQ*Y\\PxY) + S. (42) 

n—^oo 

On the other hand, we have 

D{Q^\yQy\\Pxy) < inf D{QxiyQy\\Pxy) + S 

Qx\Y- 

H{Qxis\QW)>Ri 

< sup inf D{Qx\yQy\\Pxy) + S 

I{Q*y\Qs\y)<R2 H{Qx\s\Qs)>Ri 
<r]uiPxY,Ri,R2) + 25. 

Combining this with ( [42] ) yields 

limsup?7t/(Pxy, Ri + en, R2 + e„) < r]u{PxY, Ri, R2) + 35, 

n— >-oo 

but (5 > and e„ — )• were arbitrary. ■ 
Proof of Theorem^ Recall that we may assume Ri < log|A'i|. As we are eventually considering 
small e, we may assume that _Ri + 2e < log \ Take n sufficiently large so that ^ < |. 
Let (/", f2,g"') be any code satisfying ([2]) and let 

^"(/^/2^^7") = {(x,y) : ^7"(/r(x),/2"(y)) ^ x}. 

denote its erroneous sequences. Take any Qxy such that 

HQ^AX^'lfnYn) > n{Ri + 2e). (43) 
We first show that for this choice of Qxy the following inequality holds 

Qxy(^"(/r,/2",^/")) > > 0, (44) 

we will then apply a change of measure argument. Fano's inequality gives 



But 



/f(X",/r(X")|/2"(r")) = i7(X'^|/2"(F")) +i/(/r(X")|X",/2"(F")) = i/(X"|/2"(F")) 

= if(/r(x")|/2"(r")) + iJ(x"|/r(x"),/2"(F")). 
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Therefore 



H{x-\f^{x-)j^{Y-)) = H{x-\f^{Y-)) - HimxnifnYn) 

> H{X-\f^{Y-)) - H{f,^{X-)) 
>/f(X"|/2"(r"))-n(i?i + e) 

> ne. 

The fact that ^ < f along with equations ( |45| ) and ( |46l ) gives (|44]). For S > define the set 



(46) 



< 5 



Fix < a < oo such that for all distributions Qxy, 

,QiX,Y) 



Q 



log^ 



P{X,Y) 



< a. 



Such an a exists because the alphabet is finite and P{x, > by assumption. By Chebyshev's inequality 
we have 



> 1 - {5-^)E. 



Q 



Q{X„Yi} 
PiX,,Yi) 



D{Q 



XYW-^XY 



> 1 

> 1 



log 



2 Q{X,Y) 
P{X,Y) 



a 



We may bound the error probability as follows 



> Q^y(^"(/r,/2",^?") nP")exp(-n(D(gxy||Pxy) + 5)) 
' " ^exp(-n(D(gxy||Pxy) + 5)). 



2 log I A"! 5% 



(47) 



However, for large enough 



a 



> 



21og|A'| 5% -4 log I A"! 



/3>0, 



thus, observing that the argument above holds for every Qxy satisfying ( |43| ) we see that 

PxYiS^'ifiJ^^gn) > sup /3exp(-n(D(Qxy||Pxy) + 5)). 

QxY.HQiX"\f^{Y"))>n{Ri+2e) 

Now we note that the above holds for every code satisfying thus, observing that the right hand side 
does not depend on /",(?", we conclude that 

min^Pj^^(f"(/r,/2",(?")) >min sup P exp i-n{D{QxY\\PxY) + S)) . 

/1./2.9" /2 QxY--HQ(X'-\f^{Y'^))>niR^+2e) 
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We now move the optimizations into the exponent and focus our attention there. 

max inf D{Qxy\\Pxy) 

f^-- QxY--HQ{X^\f^{Yr^))>n{R^+2e) 
log|/2"|<n(i?2+e) 

= max inf inf D{Qxy\\Pxy) 

/a": Qy Qx\y- 

log|/2"l<n(R2+e) /fQ(X"|/2"(y"))>n(i?,i+2e) 

< inf max inf D{Qxy\\Pxy) 

Qy /a": Qx\Y- 

log|/2"l<n(H2+e) HQ{X"|/2"(Y"))>n(i?i+2e) 

< inf max inf D{Qxy\\Pxy) 

Qy f?- Qx\Y- 

I{Y";fi'(Y"))<n(B^2+e) HQ{X"\f^{Y"))>n(Ri+2e) 

<inf sup inf D{Qxy\\Pxy) (48) 

Qy Qu\y"- Qx\y- 

I(Y";U)<n(R2+e)HQ(X"\U)>n{Ri+2e) 

In the previous line, we note that the deterministic functions are still feasible and on deterministic functions 
the previous two bounds agree. Henceforth the joint distribution of X, Y, U is QyQu\yQx\y, so that X, Y 
and U form a Markov chain. To continue we use the following, obtained via the chain rule 

n 

H{X^\U) = Y,H{X,\U,X{-^) 

i=l 
n 

>Y,H{X,\U,X{-\Yl-') 

i=l 
n 

= J2H{X,\U,Yr') (49) 



i=l 



where on the final line we used the fact that Xi — {U, Y^ ^) — X{ ^. The following identity also holds 

n 

I{Y-■u) = J2Hy^■,u\Yr') 

1=1 

n 

= Y,H{Y,\Yt') - H{Y.^Yt\U) 



1=1 

n 



Y^HYuyr^u). (50) 



i=l 



Substituting ( [49| ) into ( [48| ) makes the feasible set smaller because of the inequality. After substituting ( [50| ), 
we can continue to bound the exponent by 

<inf sup inf D{Qxy\\Pxy) 

Qy Qu\y" Qx]y- 

I EILi m;Yr\U)<R2+e i EILi H{XAU,Yr^)>Rr+2e 

= inf sup inf D{Qxy\\Pxy) 

Qy Qu\y" 

k EILi i{Y,;V,)<R2+e i H{xm)>R.+2e 
where on the previous line, we let Vi = (^i*^^, U). Let T denote a time sharing random variable, uniformly 
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distributed on {1, . . . ,n} and independent of everything else. Then the quantity above can be written 

inf sup inf D{Qxy\\Pxy) 



I{YT;VT,T)<R2+eH{XT\VT,T)>B^i+2e 



inf sup inf D{Qxy\\Pxy)- (*) 

Qy Qc7|yn: Qx\Y- 

I{YT;W)<R2+e H{XT\W)>R^+2e 



where we set W = {Vt,T) = {Y^-\U,T). Since (Xt, Ir) = {X,Y), the above quantity is upper 
bounded by 



inf sup inf D{Qxy\\Pxy) = VuiPxY. Ri + 2e, R2 + 2e). 

Qy Qs\y- Qx\y- 

I(Y;S)<R2+2eH{X\S)>R^+2e 

To see this, we note that every choice in (*) is a feasible choice in F. In particular for a given Qy, let 
U* denote a choice for Qu\y'^ in (*). Then choosing S so that (F, S) = (F, Y^~\ U*, T), is feasible. By 
Lemma [8} this quantity equals r]u{PxY-, Ri + 2e, R2 + 2e). Thus we have shown that 

mill P^y{£-if^J^,g-)) > (3expi-nifiuiPxY,Ri + 2e,R2 + 2e) + S)) . 

/r./2 ,9" 

Taking logs and the limsup as n — )• 00, and letting 5 | and e 10 (and invoking Lemma |9]) gives the 
result. ■ 

Appendix C 
Proof of Theorem [3] 

A. Scheme 

For a given blocklength n, we operate on a type-by-type basis and define the encoder and decoder 
functions as follows. For each type Qx, fix a conditional type Q^^xiQ^) ^ C"'{Qx,y), a decoding 
function /{QxiQy) ^ J^, and randomly choose a set of codewords B^(Qx) in the following way. The 
size of B"'(Qx) is an integer satisfying 

exp{nI{Qx; Q*z\xiQx)) + i\X\\Z\ + 2) log(n + 1)) 

< (51) 

< exp(n/(Qx; Q*z\xiQx)) + i\X\\Z\ + 4) log(ri + 1)) 

and the codewords are drawn uniformly, with replacement, from the marginal type class Tg. induced by 

Qx and Q^|^(Qx)- 

Define Z : T^^ ^ B'^{Q^) as follows. Let G{^) = B"{Q^) n r^^^(Q^)(x). If ^(x) is non-empty, 

then the output of Z{x.) is drawn uniformly at random from ^(x)[^ If ^(x) is empty the output of Z{x.) 
is drawn uniformly at random from B^{Qy^). The function Z(-) determines the codeword sent by the 
encoder to the decoder. We define = and define the encoder's message set as follows 

M = Ml X M2, 
Ml = {1,2,..., Ml =exp{nR)}, 
= {1,2,..., (n + 1)1^1}. 

Operation of the encoder: To encode a sequence x G Tq^ , the encoder sends the type of x and an 
index, t/(Z(x)), of the codeword ^(x). There are two cases to consider: 

"Codewords that appear multiple times are proportionally more likely to be selected 
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1) log \B'"{Qx)\ < nR, in which case we can map each member of B'^{Qx) to an element of A^i in 
a one-to-one manner. 

2) log \B'^{Qx)\ > nR, in which case we assign each distinct member of B^{Qx) to M.i uniformly 
at random. 

Let C/(Z(x)) denote the element to which Z{-k) is mapped. The encoder can be expressed mathematically 
as 

V'(x) = (f/(Z(x)), k{Qx)) for X e T^^ (52) 

Operation of the Decoder: The decoder operates in a two-step manner. First it attempts to recover the 
codeword Z": 

1) If < nR then can be decoded without error, 

2) If \B'^{Qx)\ > nR the decoder receives a bin index and uses the side information to pick the "best" 
z from the bin in the minimum conditional entropy sense: it searches for a z in the received bin so 
that among all z in the bin, H{z\y) > H{z\y). If there is no such z it picks uniformly at random 
from the bin. 

Let 

{z z e Bin(i) and Vz e Bin(i), 

z ^ z : H{z\y) > H{z\y) (53) 
any z e Bin(i) if no such z e Bin(i) 

where Bin(i) = {z : z e B'^{Qx) and U{z) = i} denotes the set of codewords that are assigned to index 
i. Second, the decoder uses the estimation function, /, to combine the side information y with codeword 
z to give the reproduction x. This is expressed mathematically as 

(pii, k{Qx),y) = X s.t. % = fi(fii{i, k{Qx),y)j, Yj)- (54) 

B. Error probability calculation 

It will be convenient to consider the following subsets of the sequence space 

£, = {(x,y,z) : z e T5j^^(Q^)(x),rf(x,/(y,z)) < A, 

log\B'\Q^)\>nR} 
£e = {(x,y,z):z^T4^^(Q^)(x)} 

£, = {(x,y,z) : z e r5^^^(Q^)(x),d(x,/(y,z)) > a} 

£b corresponds to a potential binning error, to a covering error and E^, to a distortion error. We will 
consider the errors on these sets separately. Equivalently we can view these error events as properties of 
the joint type, so we define 

V, = {QxYZ : nd{X, f{Y, Z))] < A, Qzix = Q*z\x{Qx) 
\og\B^(Qx)\>nR} 

T^c = {QxYZ ■ Qz\X 7^ Q*Z\x(Qx)} 

Vd = {QxYZ ■ nd{X, f{Y, Z))] > A, 

Qz\X = Q*Z\x{Qx)}- 

Before we proceed with the proof of Theorem 1, we estabUsh the following useful facts. 
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Lemma 10. Let X", F", = Z(X") Z^e generated according to our scheme and suppose that (x, y, z) 
is in {£cY, i.e. that z G Tg, ^ ^^^^^(x). T/zen 

Pr(X" = X, = y, ^" = z) (55) 
<^ly(x,y) |^„ ^ . (56) 

Proof: The proof mirrors that of Lemma [T] and is omitted. ■ 

Lemma 11. Let X", F", = Z(X") generated according to our scheme and suppose that (x, y, z) G 
£c. Then 

Pr(X" = X, F" = y, = z) 

< exp(-(n+ 1)2). (57) 

Proof: The proof mirrors that of Lemma |2] and is omitted. ■ 
Lemma 12. For all strings x, z 5mc/z that z G Tg, , 

Pr(z G S'^(Qx)) < 

(^+l)l^l(H-W)+4 

X exp(n(/(g.; g*^|x(Qx)) - ^(gz)))- 

Proof: By the construction of -B" ((5x), each of the codewords is chosen with replacement from the 
set Tq*. Thus each string has probability |Tq. and we make |-B"(Qx)| such choices (bounded by ([5T])). 
From ilO[ lemma 2.3] we have 

|TQj>(n + l)-l^lexp(ni7(Q,)). 
Invoking the union bound gives the result. ■ 
Lemma 13. Let (x, y,z) G {Sc y^SdY. Then 

Pr(rf(X", X") > A|X" = X, F" = y, Z'^ = z) 

< exp {-n {{R - J{Q^y,) - 5,")^)) (58) 

where 

J (Qxyz) = I{Qyi] Q*Z\x{Q^) ~ HQy'i Qz\y) 

anJ5,"= -log(n + l)l^l(l^l+i+l^l)+^ 
n 

Moreover, z/log |i?"(Qx)| < nR then 

Pr(rf(X", X") > A|X" = X, F" = y, = z) = 0. 

Proof: For the given sequence (x, y, z) let L be the event that z 7^ y9i(?/;(x), y). (Observe that L occurs 
when the decoder decodes the wrong codeword and that Pr((i(X",X") > A|X" = x, F" = y, = z) 
is upper bounded by Pr(L|X" = x, F" = y, = z).) 
If Qx is such that log |i?"((5x)| < ^-R, then 

Pr(L|X" = X, F" = y, = z) = 0. 
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For the case where log |i?"(Qx)| > nR (i.e. (x, y, z) G £b), we note that the set 5'(z|y) contains all strings 
z having the property that z has the same type as z and conditional empirical entropy with y that does 
not exceed iJ(z|y). 

Pr(L|X'^ = x,F" = y, Z"" = 7) 

< J2 Pr(zG i?"(Qx), f/(z) = f/(z)|X" = x,r" = y,Z" = z) 

zeS(z|y) 
z^z 

< J2 Pr(zGfi"(gx)|X" = x,r" = y)Pr(f/(z) = f/(z)|gGF"(gx)) 

zeS(z|y) 
z^z 

< Yl + 1)1^1(^+1^1^+'^ exp(n(/(Q.;g^l^(Qx))- if (Q,))) (59) 

zeS(z|y) ^ 
z^z 

where inequality ^ follows from a reasoning similar to that used in the proof of Lemma |6] and ^ follows 
from Lemma [12] Next, 

Pr(L|X" = x, F'^ = y,Z'^ = z) 

< (n + 1)1^1(1^1+1+1^1)+^ exp(nii(Q,|y|Qy)) 

X exp(n(/(Qx; QV(^x)) - ^W^)))]j^ 

= (n + i)\my\+i+m)+^ exp (-n (i? - J(gly.))) 

where the first line follows from Lemma |4j Also, since Pr(L|X" = x, = y, = z) < 1 we get 

Pr(L|X'' = X, = y, = z) 

= exp(-n(i?- J(g,y,)-5,")+). 

■ 

Lemma 14. Le? 5^ 0, 5" a^' n cx), 

G"[(5xyz, PxY, f, d, A, i?] = 
'D{Qxyz\\PxyQz\x) EQ[d{X,f{Y,Z))] > A 

D{Qxyz\\PxyQz\x) 

+I{QY;Qz\Y)-Sl^y EQ[diX,f{Y,Z))] < A 

IiQx;Qz\x)>R-6: 
00 otherwise 

and 

O^iPxY, d,A,R) = min max minmax min G^(Qxyz, Pxy, f,d,A,R), 

Qx Qz\xeC"{X^Z) Qy /eJ- Qxyz 

^~(Pxy, rf, A, i?) = inf sup inf sup inf G{Qxyz, Pxy, f,d, A, R). 

Qx Q2\x feJ^QxYZ 

In 6*" the minimizations and maximizations on Qx, Qz\x, Qy cind Qxyz cire over types/conditional types, 
and in 9'^ they are over distributions. And, in the optimization of Qxyz the marginal type/distribution 



34 



of X and Y and conditional type/distribution of Z given X are taken to be those specified earlier in the 
optimization. Then 



lim inf r (PxY, d, A, R) > e^{PxY, d, A, R) (60) 



Proof Let Q^x\ Qz\x^ Qy\ Qxyz and be such that 



9^{PxY, d, A, R) = G"(g?J.^, PxY, /("^ d, A, it:). 

For convenience, henceforth we omit writing the arguments PxY,d,A and R in G{-) and G'"(-). Also, 
when necessary for clarity, we expand Qxyz — Qx, Qz\x, Qy, Qy\xz in the argument to G and G"(-). 

By boundedness there exists a subsequence of (Qx\Q^z\xiQy\Qxyz) with index n' such that the 
sequence (g?'\ Q^"!, Qf'\ QS^, converges to a limit (Qf , Qf;^, Q??, Qfyz, /°°). Let 5 > 0, 
then there exists Qfj^ so that 

infsup inf G{Qx ,Q^x^Qy,QxyzJ) > supinfsup inf G{Qx,Qz\x,Qy,QxyzJ) - S 

Qy f Qy\XZ Qz\X / ^Y\XZ 

and there is a sequence Q^z\x converging to Q^^. Let 

(5y "* = arg min max min G^{QxYZ,f) 

Qy f Qxyz- 
Qx=q'^'^ 

Qz\x=Q^z\x 
Qy=Qy 

and by considering a further subsequence we may assume that Qy ^ Qy - Then there exists f°° so that 

inf G(Q-,g^;„g-,gy|xz,/°°)>max inf G{Q^,Q^^^,Q^,Qy^xzJ) 

Wy\xz J Qy\xz 

and we set = /°°. Let 

gS^= argmin G'^'(gxyz, Z^"')) 

Qxyz- 

Qx=Q'x'^ 

Qz\x=Q^z\x 



QY=Q'y'^ 



and by considering a further subsequence we may assume that Qxyz ~^ Qxyz- Observe that 

9'^'{PxY,d,A,R) ^ max min max min G'"'(gJ' \ g^i^, gy, gy|xz, /) 

Qz\x^C"'{X->-Z) Qy /eJ^ Qy\xz 

> min max min G"-' {Q^^ \ Q^^l , Qy , Qy\x z , f) 

Qy f^T Qy\xz ' 



max min (g^" ^ g^^^, g^." \ gy|xz, /) 



/e^ Qy\xz 



> min G(g(,"'\g^"'i,g?'\gy|xz,/^"')) 



We now verify that 



liminf G"'(g^;i^,/>')) > G(g~y^,f 
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If {Qxyzj /°°) su'^h that Eqcx,^^ [d{X, f°^{Y, Z))] > A, then the resuh holds using the semicontinuity of 



z\x. 



> 



the information measures. If (Qxyz^ f^) are such that EQ^^Jd{X, f^{Y, Z))] < A and /(Q^; Q 

R, then the sequence C^' {Q^^yz^ f'^^''^) is either oo or equal to G{Qxyz^ /°^) '^^ account of the continuity 
of Eq[-]. In the final case that (Q^y^, /°°) are such that EQ^^Jd{X, f°^{Y, Z))] < A and /(Q^; Q|j^) < 
R, then we must have limsup^.^^^ HQx'^ Q^x) + < Therefore 

liminf (Qx \ Q%x^ Q^y \ Qxyz^ /°°) — ^(Qx^ Qfix^ Qy ^ Qxyz^ f°°) 

— C^iQx^ Q'z\xi Qy' Qxyz, /°°) 

QXYZ 

= sup inf Gm.Q'^\x.QY.QxYzJ) 

f Qxyz 

> inf sup inf G{Q^,Q^j„QY,QxYZ,f) 

Qy f Qxyz 

> sup inf sup inf G{Qx, Qz\x, Qy, Qxyz, f)-S 



iZ\X 



f 



> inf sup inf sup inf G{Qx, Qz\x, Qy, Qxyz, f) - 5 

Qx Q^IX f '^Xi'Z 

= e^{PxY,d,A,R)-6 

Hence liminf„/^oo ^"'(^xy, d, A, R) > liminf„/^oo GiQ^^YZ^ /^"'^) > 0{Pxy, d, A, R) - 5. Letting 510 
gives the result. ■ 
We are now in a position to prove Theorem [3] We will accomplish this by giving an upper bound on 
the probability of error by considering the error events separately. 

Proof of Theorem ^ We start by noting that for n sufficiently large the constraint of equation Q 
is satisfied. Summing over sequences gives 

Pr(d(X",X") > A) 

= Pr(d(X", X") > A|X" = X, = y, = z) X Pr(X" = x, = y, = z) 

x,y,z 

< [ Pr(d(X", X"") > AjX'^ = X, = y, Z'^ = z) X Pr(X" = x, F'^ = y, = z 
+ ^ Pr(X" = X, = y, Z" = z) 

+ ^ Pr(X" = X, = y, Z" = z) 

£d 

where the last inequality followed from upper bounding the conditional error probability by 1 in the 
summations over 8c and £d, and by zero (Lemma 13) on (£^5 U U £dY (the sequences omitted from the 
sum). Next, we bound the sequence probabilities using Lemma [10] on 8b and £d and Lemma [TT] on £c. 



We bound the conditional error probability on using Lemma 13 

Pr(d(X",X") > A) 



< 



exp (-n(i?- J(gxyz)-5,")+) xPly(x,y) 
exp(-(n + 1)2) 



+ VPly(x,y) 
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Observing that the summation over £c decays super-exponentially, we may safely omit this term, and use 

the notation < to denote inequality to the first order of the exponent. We can rewrite the above by first 
summing over types and then over sequences within each type class. This gives us 

Pr(d(X",X") > A) 



Qx Qy 



X exp (-n {R - J((5xyz 



^xy(x,y) 



+ 1 E E ^xy(x,y) 



Note that in the summation over joint types Qxyz, the marginal types of X and Y are fixed to be those 



set by the earlier summations. Proceeding in a similar manner as was taken in going from ( [36| ) to ( [40| ) in 
the SCCSI proof (with Z taking the role of S) we obtain 

Pr(d(X",X") > A) 



< 



EE E ^^p{-<DiQxYz\\PxYQz\x) 

Qx Qy LQxYZ^'Db 

+ R-J{QxYz)-Sl')^) 
+ ^ exp nD{QxYz\\PxYQz\x)j 



QxYzeVa 



+ expi-{n+lY + nlogi\X\\y\\Z\)) 



Next, we use a + b < 2max(a, 6) to combine the first two terms. We can then upper bound the 
summations by maximizing over the types, and since the choice of test channel Q*z\x ^i^^ estimation 
function / were arbitrary, we can optimize to give 

Pr(d(X",X") > A) 



< 



|P"(A')|maxmin max2|P"(A' x y x Z)\ 



^z\x 



min max G"-[Qxyz,Pxy, f,^,R,n] 

f^J" QxYz--^ 
Qz\x—Q*z\x 



where we used the definition of from Lemma 14, taking 5" = n ^(lAfUZl + 4) log(n + 1). Moving 
the optimizations into the exponent we get 

Pr(rf(X",X") > A) 

< 2|P"(A')||P"(3^)||P"(A' X 3; X Z)|exp f -n[minmaxmin 

V L Qx Q*zix 



max min G''[Qxyz,Pxy, f,d,A,R] ). 



37 



We can absorb the set cardinalities ^2 = ^[1 + log(n + and observe that in the limit as 

n — 7- oo, ^2 vanishes. Hence we have 

liminf--logPr((i(X",X'") > A) 

n— >-oo n 

> lim inf min max min max min 

n-s>oo Qx Qz\x& Qy /eJ=" QxYZ 

C" [QxYz, PxY, f,d,A, R] 

> inf sup inf sup inf G [Qxyz, Pxy, /, A, i?] , 

'^^ Qz\x Ji^fQxyz 



where the final line followed from application of Lemma 14 



Appendix D 
Gaussian Type-classes 

For the Gaussian case {X = y = M), we need the following definition^ These are a modification 
of the Gaussian types used by Arikan and Merhav [[341. The difference is that here the type-classes 
are disjoint and the conditions specifying joint types are independent. This significantly simplifies the 
subsequent analysis and might prove useful in other applications. 

Definition 1. For a given < e < 1 and a\ > 0, a Gaussian type-class T^2 is defined as the set of 
n-sequences 

T^^ = {x G M" : |x*x - < ne}. 

For such a type-class, it can be shown that ( see the calculation at the end of this section ) 



2^ 



exp I n I h{a\) 



2al 



< Vo1(T;^) < exp 



h{al) 



2a\ 



(61) 



Similarly, for a given < e < 1 and covariance matrix 



K 



a 



X 

paxcrx 



pax cry 



a 



Y 



with non-zero variances, a joint Gaussian type-class is defined as the set of pairs of n-sequences 



T;, = {(x,y)G 



: |x x — naj^l < ne 



|y*y — ncTyl < ne 



|x y - pVx*xy*y| < eA/x*xy*y} 
This set has the corresponding volume bound 

Vol(T^) < exp + 0,(1))), 



(62) 



where we use 0,(1) to denote a quantity g{e) > having the property that lim,_>o5'(e) = 0. 

Furthermore, for a given x G T^2 , we define the conditional Gaussian type-class T^ipi) as the x-set 
of n-sequences 



{yGM":(x,y)GT^}. 

'^For more than two jointly Gaussian random variables, tliese definitions can be extended in the obvious way. 
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For this set one can show that ( see the calculation at the end of this section ) 

Vol(T^(x)) 



> 1 



1 



+ 0,(1) )exp(n(Miry|x) -/.)). 



(63) 



where is an o^{l) term whose value is determined in the proof. In Appendix D-3 we show for a Gaussian 
distribution fxi-,-), if (x, y) G T|., where K is any positive definite covariance matrix, then 



flri^^y) < exp -n(D{k\\K) + h{Qf,) - o,(l) 



(64) 

The analysis for the Gaussian case requires that we "quantize" the space of 3 x 3 covariance matrices. 
Unlike discrete memoryless sources, Gaussian sources require use of a "bounding box" to limit the number 
of types. To this end, fix < < 1 and Mu > Ml, both will be chosen later. For a fixed Q < e < Ml 
define o-^('i) = Ml + 2ie and for e, given define r]ij{r) = ^/a^i)o^j){—l + 2e(r — 1)). We will 
consider type-classes indexed by matrices of the form 



V^j[r) 
Vikis) 



^'(j) Vjk{t) 



and i,j, k, r,s,t > 1; note that not all of these matrices are positive semidefinite. 



We let = {i : 3x G T^2(j) with x*x < Mu} and similarly Vxyz = {{h j, k,r, s,t) 



T, 



K{i,j,k,r,s,t) 



with x*x < nMu and y*y < nMu and z*z < nMu}, where Mu ^ Ml- 



3(x,y,z) G 
With Sl = 



{(x,y,z) : x*x < n{ML + e) or yV < n{ML + e) or z*z < n{ML + e)}, Su = {(x,y,z) : x*x > 



3n 



nMu or y*y > nMu or z*z > nMu}, the union of the shells T^K{ijkT sty ^'^^ the set Sl cover 
entirely and we define 7?.^" = ]R'^"\(52, Su)- We denote by z/(x) the index of the shell containing the 
string X, i.e. x G ^^2(^(x))' which is uniquely defined almost everywhere in TZ 



)3n 



1) Proof of (|6T]).- Let X ~ N{f), a\)- Then 



1 > 



(27r(T 



exp 



x*x 

2^ 



dx 



> / {271 a\) 2 exp 



niaic + e 



= exp \^n y-\og{2'K(Jx] 
which gives the upper bound. For the lower bound. 



2al 



(ix 



Pr(T; 



(27r(T 



< 



(2 



Txa 



,2 \-\ 



Xj 



exp 



exp 



nt 
2a\ 



x*x 
2^ 



Vo1(T;: 



dx 



n{a 



2a\ 



dx 



Vol(T%)exp 



-n 



log(27re(T|) + 



ne 
2ai 
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Conversely, by Chebyshev's inequality 



1 - Pr(T^2 ) = Pr (|x*x - naj^\ > ne) 



< E 



XX — na 



If 



2a% 



Combining these two calculations gives the lower bound. 



2) Proof of (63); Let x G T^2 , then 

'^x 



y e 



P1 



y*x 



n 



x*x y*y 



n n 



< e 



x*x y*y 

n n 



By the triangle inequality 



y*x 

n 



x*x y*y 



n n 



< 



y*x 

n 



n 



-Oy 



n 



-ay - p 



x*x y*y 



n n 



whence 



r^(x) D ^(x) 4 |y e 



|y*y — ncTyl < ne 



y*x 



n 



- P 



x*x 



-O-y 



< 



x*x 



x*x 



-ay - p 



x*x y*y 



< 



cr 



y 



x*x 



n \ n n 

Let V be a Gaussian random vector whose law is J\f{0, IcTyil — p^)), and let Y — 
the union bound gives 

"y*x 



n 



=x + V. Applying 



Pr(A(x)") < Pr (|Y*Y - nal\ > ne) + Pr 



n 



+ Pr 



n 



-ay - p 



x*x Y* Y 



n n 



> 




The event in the third probability on the right is equivalent to 



Y*Y 



e^ia^ 



Y 



n 



> eay\ a 
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Using this fact and bounding each of the probabilities using Chebyshev's inequality yields 

payf 



Pr(A(x)^) < E 



(Y*Y - naif 



+ E 



^ n V x*x 



+ E 



e2(4-e)/4 



e^a\{a\ - e) 



24 , 4(1 -P^) , 



24 



no,{l) 



+ 
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To bound the volume we note that under the law above 



Pr(A(x)) 



npcTY^X 



x*x 



)dy 



exp 



V x^x ' 



A(x) 



2(4(1 -p^)) 



We can get an upper bound on the density by lower bounding the summand in the exponent 



x*x 



x*x 



> 72(4 ~ ^) + np^ay — ^payniypaY + sgn(p)e/2Y4 ~ ^) 
= n(4(l-p2)-/,(p,ay)) 

where f^{p,aY) = e(l + psgn(p)cryA/4 ~ ^) 8°^^ ^'^ '^i'^h e. Thus 
Pr(A(x)) < Vol(A(x))exp 



Vol(A(x))exp 



-n 



-n 



^log(27ra^(l-p2))-i-/,(p,ay) 



log(27re4(l-p2))_/^(p,ay) 



(*) 



where /, = /,/(2(4(l - p^))). Combining this with (*) and using the fact that Vol(T^(x)) > Vol(A(x)) 
gives the result. ■ 

3) Proof of (|64]); Let (X, F) ~ Ar(0,K) and (x,y) e T|. Then 

/(x,y) = [(27r)^|ir|]-t 



X exp 



2(1 -p2) \ a\ 4 cTxo-y/ 



Applying the bounds from the definition of T|. allows us to continue the inequality with 



< 



Ti 1 

exp I -2^°s((27r)2|ir|)-^^^— ^ 



;;-2 



X 



a 



X 



+ 



4-e) 2pn^(4 + e)(4 + e)(p + sgn(p)e) 



axCTy 
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For f^{ax,o-Y,crx,o-Y, p, p) (which goes to zero with e), we can write 



<exp-n(^^(^\og{{2nf\K\) + -^ 



+ 



2pax(TYp 

Finally, using the identity 



axcryil - p2) 



-p2) 4(1 -p2) 
fe{ax,crY,ax,&Y,p,p)^ 



gives 



D{K\\K) = \ {log ^ + Tt{K-'K) - 2^ 
/(x,y) < ex^-n{D{k\\K) + ^ log(27re)2|i^| 



2 

- /e(cTx,cry,ax,Cry,p,p) 

Appendix E 
Proof of Theorem [5] 

A. Scheme 

Let e > and Ml,Mu as defined in Appendix |D} For each blocklength n, and for each shell of 
n-length x sequences, T^2(^i^ we choose a Gaussian test channel. The test channel is specified by selecting 
integers k(i) and s{i) (such that a'^{k{i)) < Mu) so that if X ~ A^(0,o"^(i)) is the input to the channel 
then (X, Z) ~ A/'(0, cr^(«)); where the bar applied to a scalar results in 



aHi) 



(65) 



The codebook for the zth shell of x sequences is a randomly chosen set of codewords, B^{i), selected 
in the following way. The size of B^{i) is an integer satisfying 

exp(n(WX; Z) + 2g,)) < |i?"(z)| < exp(n(/;^(X; Z) + 3g,)) (66) 



(Mi))- 



where ge = fe + t/2a'^{k{i)) (cf. ([63])) and the codewords are chosen uniformly from the shell T^-. 

For X G r^2(j), define 2'(x) : 7'^2(j) B"{i) as follows. We can cover the shell 7'^2(j) with conditional 
type-classes T-^j-—{B''^{i)[j]), where B"'{i)[j] is the jth codeword. This covering induces a partition of 

sequences in T^2i^.iy ^^^^ the partition being based on the set of possible codewords in i?"(i) that have the 
correct joint type with the sequences. For each set generated by this partition, we choose the codeword for 
that set uniformly among the covering conditional type-classes. For the sets not covered by any class, the 
codeword is selected at random from B^{i). We define = Z(X"). Finally, let the encoder's message 
set be defined asA^ = A^ixA^2, where 

A^i = {1, . . . , Ml ^ exp(ni?)}, = {1, 2, . . . , |P^|}. 

Operation of the Encoder: To encode a sequence x G T^2(iy the encoder sends i, the "type" of 
X and an index, [/(Z(x)), of the codeword ^(x). If log|i?"(z)| > nR we use random binning of the 
codewords, and U{Z(yi)) denotes the element of M.i to which 2'(x) is mapped. For sequences with 
x*x ^ {n{ML + e), nM(/] the encoder declares an error. The encoder can be expressed mathematically as 

V;(x) = (f/(Z(x)),^)forxGr^2(,) (67) 
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Operation of the Decoder: The decoder operates in a two-step manner. First it attempts to recover the 

codeword Z": 

1) If log < nR then can be decoded without error, 

2) If log > nR the decoder receives a bin index and uses the side information to pick the z 
from the bin by searching for a z in the received bin so that among all z in the bin, p| ^ < p| y. If 
there is no such z, the encoder picks uniformly at random from the bin. 

Let 

{z z e Bin(Z) and Vz z e Bin(Z), 

Ply < Ply (68) 
any z if no such z e Bin(/) 

where Bin(Z) = {z : z e and [/(z) = /} denotes the set of codewords that are assigned to 

bin /. The marginal types i,j of x and y are known, and for each pair i.j we choose an estimation 
function. We restrict our attention to estimation functions that are linear in the side information and the 
codeword, i.e. Xij{y,z) = a{i,j)y + j3{i,j)z, where a{i,j) = i'e,/3{i,j) = k€ for integers p,k so that 
a{i, j), j) G [—Mx, Mx]. a and 7 will be optimized later and Mx > is an arbitrary positive constant. 
For the second step the decoder uses the estimation function. A, to combine the side information y with 
codeword z to give the reproduction x. This is expressed mathematically as 

ip{l,i,y)^± (69) 
s.t. x^ = a{i, u{y))ym + iy{y))(pi{l, i, y)m- 

B. Key events 

The following subsets of R^" will be of interest. 

E, = {(x,y,z) e 7e=^" : z e T^^^^s), 

^||x - A,(,),,(y)(y,z)||2 < A,log|5"(i/(x))| > ni?} 
£:e={(x,y,z)e7^=^":z^T^^(x)} 
£a = {(x, y, z) e 7^=^'^ : z e T^.(^(x), 

^||x- A^(x),i.(y)(y,z)||^ > a|. 

On Sh, the distortion constraint is violated only if there is a decoding error. On Sc we say there is a 
"covering" error: the encoder cannot find a codeword with the desired joint type with the source sequence. 
On Ed, the distortion constraint will be violated even if the codeword is decoded correctly by the decoder. 
For X e T^2ii\, F is defined to be the event that there exists z e B'^ii) such that z e r^^(x). 

C. Error Probability Calculation 

We will first state several useful lemmas, which are "Gaussian versions" of the discrete memoryless 
Wyner-Ziv lemmas. 

Lemma 15. Let X", F", Z"' — Z{X"') be generated according to our scheme and suppose that A C 
{ScY n 7^3". Then 

Pr((X'',y'',z") e A) 

< I /ly(x,y)— —— ^ —dicyz. (70) 

"A ^ ^Vol(T^(x)) 
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Proof: For the x, y, z e in this lemma, {X" = x, y" = y, = z} impUes that the event F has 
occurred. Let Axy be the projection of A onto space, i.e. Axy — {(x, y) : (x, y, z) & A for some z} 

and ^'''y = {z : (x, y, z) G A}. Then 

Pr((X",F'^,Z'^) e A) 

= Pr((X",r'*,z") e A,F) 

= /" /l^(x,y)Pr(F|X" = x,y" = y) 

X Pr(Z" e A^'^IX" = x, = y, F)dxy 

< / /^y(x,y) 

X Pr(Z" e A'^'^IX'^ = X, = y, F)dxy 
= / /xy(x,y) / /z|x,y,F(z|x,y)dzdxy 

where in the final line we used that conditional on F and X'^ — x, is uniformly distributed over 
T '^^^^^^^^ (x) and independent of Y. ■ 

Lemma 16. Let = Z{X"') be generated according to our scheme. Then for n sufficiently 

large 

Pr((X-,r-,Z") e £e) < |7'J,|exp(-exp(no,(l))) (71) 
Proof: For (x, y, z) e £^c> {-'^^ = x, = y, = z} implies that the event F"^ has occurred. Thus 

Pr((x",y'^,z") e 

= Pr((x",y",z") e£:e,F") 

< J] Pr(X"erj.(,))Pr(F^|X"e7:.(,)) 

X Pr((X'^,y",z") e e T^2^i),F^) 

Pr(F^|X" e r^2(.)) is the probability that there is no z e so that z e T|^(X"). We will now 

give an upper bound on this probability using the properties of the codeword set. Let m — |-B"(i)| and 
B'^{i)[j] be the jth codeword in the set B'^{i). Then 

m 
m 

Vol(T-£^(X"))^ ™ 



44 



where the last line followed by applying the inequality (1 — t)'" < exp(— tm). Next, using ( |6T] ) and ([63]) 
to bound the volume of the shells, 

< exp (^1 - — ^ - 0,(1)^ mexp (^-n (/^(X; Z) + g, 

< exp(— exp(nOe(l))) 



where the final line followed by substitution our choice of m from ([66)). 
Lemma 17. For any positive definite covariance matrix K, 

< exp ( -n{D{K\\K) - o,(l) - 5^)) 



(72) 



where K is defined in (14) and 



and 5p = - log ( 1 \— - 0,(1) 

n \ noAl) 



-1 



Proof: Lemma [T5[ gives an upper bound for the probability density on £i, and £d. Applying this 

1 



lemma with ( [61] ) and ( [64[ ), we get 

Pr(T^n(^,U^,))< / /£(x,y 



Voi(r- 



a2(Kx)) 



:x)) 



dx.yz 



< j^^ exp ( - n{D{KxYm + /^(i^xy) - o,(l) 



X 1 



noe(l) 



Oe(l) exp(-n(/i(i^z|x) - Oe(l)))(ixyz 



Vol(r^) exp ( - n{D{KxYm + /i(irxy) - Oe(l))) 

o,{l) ) exp(-n(/i(ii'^|x) - o,(l)))rfxyz. 



X 1 



noe(l) 

Bounding the volume term using ( [62[ ) and applying the identity 

D(ir||i?) = D{KxYm + /^(i^zix) - /i(i^z|xy) 

gives the result. 

Lemma 18. Let y, z Z^e strings with empirical correlation p^ y and let 

A(z,y) = {zGr^. :p2y>p2j. 

Then 

Vol(^(z,y)) < 2 exp log {2nea''il - p^ ^)) + no^il] 

Proof: The empirical correlation does not change if we scale the vectors, so we may assume that 
z*z = y*y = n((T^ + e). Suppose z*y > 0, in which case 

A{z,y) = {z G r^2 : Pz,y > Pz,y} U {z G T^2 : pi,y < -Pz,y} 
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and by symmetry the two sets on the right hand side have the same volume and it suffices to consider 
one of them. 

{z e T^. : p,,y > p.,y} = {z e T^, : > 

Vz*zy*y Vz*zy*y 

= |z e T^, : -2 p,,yzV < -2 p,,yzV^| 

V z'^z 

= B{z,y) 

We now bound the volume of B{z,y). Let X ~ ^^(px.yy, — pl y)I). Then 
1> / /x(x)cix 

7B(z,y) 

To continue we upper bound the summand in the exponent as follows 

'^{xi - p^,yyif = x*x - 2p^,yX*y + p^yY^y 

< n{a'' + e) - 2p,,yxV + plyn{a^ + e) 

< n{a^ + e) - 2plyn{a^ + e)(l - o,{l)) + plyn{a^ + e) 
<nia'{l-ply) + o,{l)). 

Substituting the above into (*) gives 

Vol(S(z, y)) < exp (n{^ \og{2na'{l - pl^)) + ^ + o,(l))) 

- exp (n(^ Iog(2W(l - pl^)) + o,(l))) . 

Observing that an identical argument holds for z*y < we are done. ■ 
Lemma 19. Let (x, y, z) e {Ec UEaY^ Then 

{^WX"" - X^'Wl > A|X" = x,F" = y,Z^' = z) 
< exp {-n {R - J{K) - o,(l) - (5^)+) (73) 
where K — K{i,j, k{i),r, s{i),t) is the type containing (x, y, z) and 

J(K)^Ik(X;Z)-Ik(Y;Z), 

Moreover, if log \B^(iy(x.))\ < nR then 

Pr > A|X" = x,y" = y,Z" = z) =0. 

Proof: Let L be the event that ^ <^i(t/'(X"), F"). Observe that L occurs when the decoder 
decodes the wrong codeword and that Pr (^H-'^" — X'^W'i > AjX'* = x, = y, = z) is upper 
bounded by Pr(L|X" = x, = y, Z"" = z)? 
If i is such that log < nR, then 

Pr(L|X" = X, = y, = z) = 0. 
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For the opposite we case we argue as follows. Quantize the set T^i^^^i-^^ in cells of diameter at-most 5 
so that each cell is either entirely typical with respect to x or not (except possibly the boundaries). We 
will now study 

Pr(L|X" = x,F" = y,Z"G [z]^), 

where [z]*^ is the cell containing the fixed z. In order that L occurs there must be a cell containing a 
codeword z such that p|y > Pz",y ^(^) = U{Z''''). Let -B(y, [z]*^) be the set of cells containing at 
least one z such that pl y > p^, y for some z' e [z]''. By summing over each cell (written as [z]) we have 

Pr(L|X" = X, = y, e [z]'') 

< J2 e [z]',U{B-{tm = U{Z-)\X- = ^,Y- = y,Z" e [z]^) (74) 

[z]<'eB(y,[z]«) 

[z]VN]^ 

+ Pr(3i : B^{i)[j] ^ Z^,B^{i)[j] e [z]' ,U{B^{i)\j]) = U{Z^)\X^ ^ ^,Y^ = y, e [z]'). (75) 

The inequahty follows by the union bound and also because the right-hand side assumes that a codeword 
in any of the cells in B{y, [zY) will lead to an error. 
We observe that 

limPr(3i : B^{i)[j] ^ Z\B"(i)\j] E [z]' ,U(B-(i)[j]) = U(Z^)\X^ = x,y" = y, e [z]^) = 0, 

<5— >-0 

because the probability of a cell containing two codewords is negligible as 6 tends to zero. 
Turning now to the first summand, applying the union bound gives 

Pr(3j : B-{t)\j] e [z]^ = = x, F« = y, G [z]^) 

< J2 Pr(s"(i)[i] e [z] V" = x,y" = y, e [z]*) 

X Pr(C/(S"(i)[7-]) = [/(Z-)|Z" e [z]^S"(^)[J] e [5]^). 

Conditioned on {Z^ e [z]'', _B"(i)[j] G [z]^}, the chance that the two (necessarily different) codewords 
share the same bin is exp{—nR) by the code construction. We will now show that 

Pr(S"(i)[j] G [zj^lX"" = x,y" = y, G [z]"^) < Pv{B''{i)\j] G [zJ-^lX" = x,y" = y), (76) 

to establish this we will show 

Pr(Z" G [zj-^lX" = x,^^ = y,B'\i)[j] G [i]^) < Pr(Z" G [z]^|X" = x, F" = y) 

which implies the inequality by reversing the conditioning. Suppose first that [z]^ was a cell not typical 
with respect to x. Observe that 

Pr(Z" G [z]^|X" = ^,Y^ = y,B'^{t)[j] G [i]') 

= Pr(Z" G [z]^F|X" = x,F" = y,5"(i)[j] G [5]'') 

= Pr(F|X" = x,r" = y,5"W[j] G [if ) 

X Pr(Z" G [z]^|X" = ^,Y- = y, G F) 

< Pr(F|X" = x, = y) 

X Pr(Z" G [z]^|X" = x,F- = y,5'^(z)[j] G 
= Pr(F|X" = x,F" = y) 

X Pr(Z'^ G [z]^|X" = X, = y, F). 
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The inequality follows because F has a greater chance to occur if one of the slots is not occupied by a non- 
typical codeword. The final equality follows because is independent of {B"'{i)[j] G [z]*^} conditional 
on F. Therefore 

Pr(Z" G [z]^|X" = x,r" = y,i?"(z)[j] G [i]') 
< Pr(Z" G [z]^ F|X" = X, y" = y) 
= Pr(Z" G [z]^|X" = X, = y). 

To argue the case when [z]*^ is a cell that is typical one may use a straightforward coupling argument. 



Together these two cases establish (76). Now 



Pr(i?"(z)[j] G [i]'\X- = x,F" = y) = / Vol(T^.(,(,)))-idz', 



as 5 —> it follows that 



Pr(5"W[j] G = x,r" = y) ^ Pr(5"(0[j] G = x, = y) 



— idz' I "v^n 



Vol(T^: 



(fc(i))> 



c/z'. 



Also as (5 — )• we have that 



[z]^eB(y,[z]*) 
[z]*^[z]^ 



Therefore in the limit as 5 — )■ 0, 



Pr(L|X" = X, F" = y, = z) < [ 

J z: 



exp(-ni?) Vol(T^^ 
Vol(A(z,y))|5'^W|exp(-ni?)Vol(T:, 



(fc(i))J 



where we used the set yl(z,y) from Lemma 18 Now using the result of Lemma [T8| and ( |6T] ) we obtain 



< 2 1 



exp ^ — n^i? + /(y; z) 



2a4(A:(2))V' 



<2(l-^^^pll) exp(-n(i?-J(x,y,z)-o.(l) 



Also, since Pr(L|X" = x, F'' = y, Z" = z) < 1 we get 

Pr(L|X" = x,y" = y,Z" = z) 

exp {-n {{R - J(x, y, z) - o,(l) - 5,)^)) . 



Lemma 20. Let 6p, 6h be sequences going to zero as n ^ oo, 



Ex[(X-A(y,Z))2] > A-o,(l) 



G:(ir,E,A,A,i?) = 

{ D{K\\K) - o,{l) - 5, 
D{K\\K) - o,{l) - 5p 

+ {R- Ik{X- Z) Ek[(X - A(r, Z)f] < A - 0,(1) 
+Ik{Y- Z) - Oe(l) - 5bY and Ik{X; Z) > R - 0,(1) 

oo otherwise, 
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7r^{R, A, E) = min max mill maxmmG^{K^, E, A, A, R), 

i k,s j A r,t 

and 

7r(i?, A, E) = inf sup infsup inf Gg{K,E, X, A, R), 

where is shorthand for K^{i, j, k, r, s, t) and K is a covariance matrix with entries {ax, cry, crz, Pxy, Pxz, 
Then 

liminfliminf<(i?, A,E) > 7r(i?, A,E). 

e-^O n->oo 

Proof: Let 5 > 0. For e > define 

G',(X,E,A,A,i?) = 

(D{K\\K) - 0,(1) Mk[{X - X{Y, Z)f\ > A - o,(l) 
D{K\\K)-o,{l) 

+ {R- Ik{X- Z) E^[(X - A(y, Z)Y] < A - 0,(1) 
+7^(y;Z)-o,(l))+ and 7k(^; Z) >i?- 0,(1) 

oo otherwise. 



and 



7rg(i?, A, E) = minmaxminmaxminGg(irg, E, A, A, R). 

i k,s j A r,t 



Tlien for any clioice of arguments and n sufficiently large Ge — < |. Hence 

r 

liminf7r,"(i?, A,E) > 7r,(i?, A, E) - -. 

n— ^-oo 3 

Via the use of the functions o'^(-) and ?7(-, ■, ■) we write the optimization above as follows 

7rg(i?, A, E) = min max min max min G^{K^, E, A, A, R), 

fx crz,pxz cry A Pxy,Pyz 

where the use of max, min are justified since we optimizing over finite sets. 
Take any sequence ^ 0. Let X^"*) = K^'^'>{a'p,a^^\pt\cr^^\p^^\p&) and A^"*) be such that 

n^R, A, E) = G,^(irM, E, A^-), A, R). 

By considering subsequences, we may assume that — )■ and A*^™) A°°. Then there exists 
af,p^, so that 

infsup inf GciKiax ,cTY,a'^ , pZ, Pxy, Pyz),^, K A, R) 

<^Y \ Pxy,Pyz 

> sup infsup inf Gg{K{(Tx ,(Ty,(7z, pxz, pxy, Pyz),^,K ^,R) - is 

(rz,Pxz A Pxy,Pyz 6 

and there are sequences pi^\ cr^^ converging to and a'z respectively. Let 

ctJ"^ eargminmax min G,^{K{a'^^\aY,a^^\p'^^\ pxy, Pyz),^,\ ^,R) 

ay ^ P'^y 'Py 

and by taking a further subsequence we can assume cr^^ — > dy- Then there exists A°° such that 

inf GG{K{a'^, a^, a^, pZ, P^y, Pyz), S, A~, A, R) 

PxyiPyz 

> sup inf GG{K{a^, a^, a^, p^, Pxy, Pyz), S, A, A, i?) - ^ 

X Pxy, Pyz O 
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and we let A*^*"^ be a sequence converging to A°°. Let 



(pIT^ P&^) e argminG, Jif \ p1T\ P..), S, A(-), A, R). 



Pxyipyz 



Define K^^^ ^ ir(-)(4"\4"^\af then observe that 

= max minmax min Ge^{K''"''\a^x \'^Y,<yz, Pxz, Pxy, Pyz),^, ^, R) 
>minmax min G,„(ir(™)(4™\ cry, pi^), p,,, p,,), E, A, A, i?) 

A pxytpyz 

= max min 4"^ 4"^ pIT^ P.., Py.), S, A, A, i?) 

A pxyipyz 

> min a„(i^(™H4™^4™^4"^\piT\p.y,P..),S,A(-),A,/2) 

Pa;!/ iPyz 

=G,„(ir(™),S,A(™),A,i?) 

By examining the various cases and using the continuity of expectation and the information measures, 
one can show that 

liminfG,„(ir('"),S,A('"),i?,A) > G'G(i^'°°, S, A°^, i?. A). 



Furthermore, 



GG(i^'",S,A°°,i?,A) 

> inf GG(i^(a^,a??,a?,pr.,Pxy,Py.),S,A°^,i?,A) 

Pxz ipyz 

>sup inf GG(ir(a^,a??,a^,p^„p,.„p,,),S,A,A,i?)-^ 

\ PxytPyz O 

> inf sup inf GdKia'^, ay, a'^,pZ, Pxy, Pyz), S, A, A, R) - - 

o-y X Pxy, Pyz O 

2,5 

> sup inf sup inf GdK^ax , (Ty, (^z, pxz, Pxy, Pyz),^, \ A, R) - — 

<TZ,Pxz A Pxy,Pyz 6 

9A 

>7r(i?,A,S) - — 



Hence 



Hminfliminf^ (^,A,S) > 7r(i?, A, S) - 5. 

But e — > and 5 > were arbitrary. 
Proof of Theorem |5j' 

1 

II V " "'|2 



n 



Pr ( -||X" -X"||2 > A) (77) 



= Pr > A|(X",y",Z") G (7^^")")Pr((7^^")" 

+ Pr > A|(X",r",Z") G 7^^") Pr(7^3") 

< ! Prf-||X"-X"||2> A|x,y,zVi^(xyz) + Pr((7^3")" 



For now we focus on the integral and will deal with Pr((7?.'^")'^) separately. 
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Observe first that the error probability on {Sb UScU £dY is zero, thus we can we can split the integral 



as follows, allowing us to deal with the various key events defined in Section E-B 

[ Pr (lllX" > A|x,y,z)(iF(xyz) 

+ / Pr(-||X"-X"||2> A|x,y,z)dF(xyz) 

+ / Pr('-||X"-X"||2>A|x,y,z)dF(xyz). 

Bounding the error probability on Ec and Ed by 1 gives 

Pr(^c n 7^^") + Pr(£d n 7^^") (78) 
+ / Pr(-||X"-X"||2>A|x,y,z)(iF(xyz). 

By Lemma 16 Pr(£^c fl 7^^") tends to zero double exponentially with the block length and can therefore 
also be neglected. Let 

Vd = {K:T^Kr\Ed^^. 



Then applying Lemma 17 gives 

exp{-n{D{K\\K) - o,{l) - 6p)) 
< > > iVxYyl max 

^ r,t:K(j,j,fc(i),r,s(i),t)eX'd 

exp(-n(D(ir||ir)-o,(l)-5p)). 



where we have written K for K^{i,j, k,r\ s{i),t) and likewise K for K^{i,j, k{i)^r\ s{i),t). Next let 

Vb = {K: Ti- nEt^ 0}. 



Addressing the integral in (78), 

f Pr (1 ||X" - > A|x, y, z) dF(xyz) 

< J2 I exp{-n{R-J{K)-o,{l)-6ty) 
(iF(xyz) 

< J2 ^wi-n{D{K\\K) - 0,(1) + {R- J{K) - o,(l) - 5^)+ - Sp)) 

expi-n{DiK\\K) - o,(l) + (i? - J(ir) - o,(l) - 5,)+ - 6,)) 

< > > iPt-vzl max 

^ J- ^ I (r.t):i^(j,j,A:(i),r,s(i),t)e»6 

* j 

exp{-n{D{K\\K) - 0,(1) + {R- J{K) - o,(l) - 5,)+ - 5^)), 
where (a) follows from Lemma [19] and (b) follows from Lemma [iTl 
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Turning to (T^^")*^, using well-known large-deviations results for the Gaussian distribution, we obtain 

Pr((7^3")^) < 2 Pr(x*x < n(Mi + e)) + 2 Pr(x*x > nMu) 

< 2exp (-^ {{Ml + e) - log(Mi + e) - 1 - o,(l))) 

+ 2exp (^-^(Mt;-logM^- 1-0,(1))) . 

Now Mu and Ml can be chosen so that this term does not dominate the exponent and can therefore be 
neglected. Combining the various bounds (and neglecting the terms in the previous equation) gives 

Pr Q||X"-X"||2 > A^7^3") 

< E \^XYz\\^ max eM-n{D{K\\K) -5,- o,(l))) 
+ max exp{-n{D{K\\K)-oJl) + {R-J{K)-oJl) 

Using the formula a + h <2 max(a, h), we can upper bound the quantity in square brackets by 
2max( max exp(— n(L>(i<'| IX) — 5p — 0,(1))), 



max exp{-n{D{K\\K) - o,(l) + {R - J{K) - o,(l) - 5^)+) - 5p) 



Note that the sets and may overlap. However, without loss of generality, we may assume that the 
Oe(l) terms are such that the objective in the Vd max is no smaller than the objective in the Vb max. This 
quantity can then be further upper bounded by replacing the maximum over (r, t) such that K E Vb with 
a maximum over (r, t) such that K E Vb\Dd- This yields 

2|P^^^|maxi7(ir), 

{■r,t) 



with H{K) = exp{-nGl'{K)), where Gl'{K) is as in Lemma |20 
Thus 

P (^\\X- - > a] <J2J2^\nYz\^J^H{K). 

i j 

Since A and the choice of the test channel were arbitrary, the right-hand side is upper bounded by 

'2\VxYz\'^ max min max min ma.xH{K). 

i k,s j A r,t 



We then let take logs, divide by n, and let n tend to infinity and e tend to zero, invoking Lemma 20 to 
obtain the desired result. ■ 

Appendix F 
Proof of Theorem [6] 

Proof: 

Let (7" be a code for the two-sided Gaussian rate distortion problem with conditional rate distortion 
function Rx\y and define 

£X = {(x, y) : l|x - ^"(r(x, y), y) 11^ > nA} 
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and 

El ^ {(x,y) : uK > ||x - r7"(r(x,y),y)||^}, 

where K E IR+ is to be specified later. For R fixed, choose a covariance matrix 11 so that 

Rx\Y{fu,A)> R- (79) 

Let A' be the solution to /2x|y(/n, A') = R and define A(/",^") = En[^||X" - ^"(/"(X", F"), F") H^]. 
Then according to [|36l section 4] 

Rxwifu, A)>R = Rxwifu, A') > i?x|y(/n, A) (80) 

for every n and code (/", (?") with rate at most R. Monotonicity of the rate distortion function implies 
that A(/",^") > A' > A. 

To continue we modify our original code to give (/",^"). The modification comprises adding a new 
codeword such that the decoder emits the string on receipt of this codeword. Encoder f"', knowing 
the side information can choose to send this codeword if the choice by results in a higher distortion 

than If we let X" = ^(")(/(")(X", F"), F") and X" = ^(")(/(«)(X", F"), F") then we see that 

n~^(X" — X")^ < ^||X"||2 a.s. Modifying the code in this way only reduces the squared error, hence 
defining 

SI = {(x,y) : ||x-^"(r(x,y),y)||2 > nA} 

(and correspondingly S^) we see that £^ D £a- In the following all expectations and probabilities are 
with respect to the law /n unless stated otherwise. 

E[||X" - X'^\\l'^£«n(£^y] - ^[Il^"ll2l£^n(^^)=] 

< E[||X"||2l{||X"||2>nA'}]- 

Next, applying the Cauchy-Schwarz inequality gives 



< jEH\\X''\\IY]Pv{\\X''\\l > nK) 



\ 



E 



-2 
'j 

i=l j=l 



Pr(||X"||2 > nK) 



nE[Xf] + {n^ - n)E[X2]E[X2]) Pr(||X«||2 > nK). 



Choosing K = E[Xf] + e and applying Chebyshev's inequality to the probability allows us to further 
bound this quantity by 



< J{nE[Xf] + (n2 - n)E[X2]E[X2]) 



Hence 

EKi||X"-X"|rW ).] 



< J(n-iE[Xf] + (1 - n-^)E[Xl]E[Xl]) 



EfXfl - EfX212 
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which goes to zero with n. We note that this new code has rate R — R + n ^log(l + ex.p{—nR)) — 
R + o„(l). Let A' — A > 5i > 0, and A be the solution to it! = -R(/n, A). Then for n sufficiently large 

A' - A < 5i. 

We also note that A(/",^'*) > A. One may decompose the space into different events to see that 

A(r,r)=E[n-^ll^"-^"ll2] 



= E[n-lX"-X-||^l(,-„).] 

+ E[n-iX"-i«||^l,-„^(^„).] 
< A Fr{{Eiy) + K Pr(£X n 



< A(l-Pr(^2)) + XPr(^X) 



I.e. 



Thus 



Pr(£2) > 



A{r, g") - A - E[n-^||X" - 
K-A 



A- 

> 



A-A-E[n-i||X«-X"||2l^- - 



K-A 
^ K-A ="^° 



for all n>ni (where 62^61 + E[n-\X'' - i:")2l^nn(^n)c]). Next, we set 



By the law of large numbers. 



(x,y): 



-iogf^-i^(nilE) 

^ /E(x,y) 



I 



/n(x,y)dxy > 1 - -a 



for all n sufficiently large. Combining everything, this gives 

/e(x, y)(ixy 



> 



/s(x,y)dxy 



[ /n(x, y) exp ( - log {"j^'*^? j (^xy 



> -aexp(-n(r'(n||E)+53)). 



(81) 
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We observe that this inequahty holds for all codes of rate at most R and 11 satisfying (|79j). To complete 
the proof it suffices to show that 

lim inf DmilE) = inf D(Il\\T,) 

e-s>On:_R(n,A)>R+e U:R{U,A)>R 

The first direction (>) is obvious. For the reverse inequality, choose 11* to achieve within 5 of the 
infimum on the right-hand side. Let H'^''^ be a collection of covariance matrices converging to 11* such 
that R{Il^''\ A) > R+e. That such a choice is possible follows by continuity of the rate distortion function. 
Then 

lim inf L)(n||S) < limDm^^^llS) = L)(n*||S) < inf D(n||S) + 5 

e^On:_R(n,A)>iJ+e U:R{U,A)>R 

by continuity of relative entropy. But 5 was arbitrary. ■ 
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